Extracting epidemiologic exposure and outcome terms from literature using machine learning approaches.

Lu Y, Xu H, Peterson NB, Dai Q, Jiang M, Denny JC, Liu M
Int J Data Min Bioinform. 2012 6 (4): 447-59

PMID: 23155773 · DOI:10.1504/ijdmb.2012.049284

Much epidemiologic information resides in literature, which is not in a computable format. To extract information and build knowledge bases of epidemiologic studies, we developed a system to extract noun phrases about epidemiologic exposures and outcomes. The system consists of two components: a natural language processing (NLP) engine; a machine learning (ML) based classifier. Four ML algorithms were applied and compared over different feature sets. To evaluate the performance of the system, we manually constructed an annotated dataset. The system achieved the highest F-measure of 82.0% for extracting exposure terms, and 70% for extracting outcome terms.

MeSH Terms (7)

Algorithms Artificial Intelligence Epidemiologic Factors Humans Information Storage and Retrieval Knowledge Bases Natural Language Processing

Connections (1)

This publication is referenced by other Labnodes entities: