OBJECTIVE - To describe the development and evaluation of computational tools to identify concepts within medical curricular documents, using information derived from the National Library of Medicine's Unified Medical Language System (UMLS). The long-term goal of the KnowledgeMap (KM) project is to provide faculty and students with an improved ability to develop, review, and integrate components of the medical school curriculum.
DESIGN - The KM concept identifier uses lexical resources partially derived from the UMLS (SPECIALIST lexicon and Metathesaurus), heuristic language processing techniques, and an empirical scoring algorithm. KM differentiates among potentially matching Metathesaurus concepts within a source document. The authors manually identified important "gold standard" biomedical concepts within selected medical school full-content lecture documents and used these documents to compare KM concept recognition with that of a known state-of-the-art "standard"-the National Library of Medicine's MetaMap program.
MEASUREMENTS - The number of "gold standard" concepts in each lecture document identified by either KM or MetaMap, and the cause of each failure or relative success in a random subset of documents.
RESULTS - For 4,281 "gold standard" concepts, MetaMap matched 78% and KM 82%. Precision for "gold standard" concepts was 85% for MetaMap and 89% for KM. The heuristics of KM accurately matched acronyms, concepts underspecified in the document, and ambiguous matches. The most frequent cause of matching failures was absence of target concepts from the UMLS Metathesaurus.
CONCLUSION - The prototypic KM system provided an encouraging rate of concept extraction for representative medical curricular texts. Future versions of KM should be evaluated for their ability to allow administrators, lecturers, and students to navigate through the medical curriculum to locate redundancies, find interrelated information, and identify omissions. In addition, the ability of KM to meet specific, personal information needs should be assessed.