The publication data currently available has been vetted by Vanderbilt faculty, staff, administrators and trainees. The data itself is retrieved directly from NCBI's PubMed and is automatically updated on a weekly basis to ensure accuracy and completeness.
If you have any questions or comments, please contact us.
New therapeutic approaches are needed for gestational diabetes mellitus (GDM), but must show safety and efficacy in a historically understudied population. We studied associations between electronic medical record (EMR) phenotypes and genetic variants to uncover drugs currently considered safe in pregnancy that could treat or prevent GDM. We identified 129 systemically active drugs considered safe in pregnancy targeting the proteins produced from 196 genes. We tested for associations between GDM and/or type 2 diabetes (DM2) and 306 SNPs in 130 genes represented on the Illumina Infinium Human Exome Bead Chip (DM2 was included due to shared pathophysiological features with GDM). In parallel, we tested the association between drugs and glucose tolerance during pregnancy as measured by the glucose recorded during a routine 50-g glucose tolerance test (GTT). We found an association between GDM/DM2 and the genes targeted by 11 drug classes. In the EMR analysis, 6 drug classes were associated with changes in GTT. Two classes were identified in both analyses. L-type calcium channel blocking antihypertensives (CCBs), were associated with a 3.18 mg/dL (95% CI -6.18 to -0.18) decrease in glucose during GTT, and serotonin receptor type 3 (5HT-3) antagonist antinausea medications were associated with a 3.54 mg/dL (95% CI 1.86-5.23) increase in glucose during GTT. CCBs were identified as a class of drugs considered safe in pregnancy could have efficacy in treating or preventing GDM. 5HT-3 antagonists may be associated with worse glucose tolerance.
Copyright © 2018 Elsevier Ltd. All rights reserved.
OBJECTIVE - The traditional fee-for-service approach to healthcare can lead to the management of a patient's conditions in a siloed manner, inducing various negative consequences. It has been recognized that a bundled approach to healthcare - one that manages a collection of health conditions together - may enable greater efficacy and cost savings. However, it is not always evident which sets of conditions should be managed in a bundled manner. In this study, we investigate if a data-driven approach can automatically learn potential bundles.
METHODS - We designed a framework to infer health condition collections (HCCs) based on the similarity of their clinical workflows, according to electronic medical record (EMR) utilization. We evaluated the framework with data from over 16,500 inpatient stays from Northwestern Memorial Hospital in Chicago, Illinois. The plausibility of the inferred HCCs for bundled care was assessed through an online survey of a panel of five experts, whose responses were analyzed via an analysis of variance (ANOVA) at a 95% confidence level. We further assessed the face validity of the HCCs using evidence in the published literature.
RESULTS - The framework inferred four HCCs, indicative of (1) fetal abnormalities, (2) late pregnancies, (3) prostate problems, and (4) chronic diseases, with congestive heart failure featuring prominently. Each HCC was substantiated with evidence in the literature and was deemed plausible for bundled care by the experts at a statistically significant level.
CONCLUSIONS - The findings suggest that an automated EMR data-driven framework conducted can provide a basis for discovering bundled care opportunities. Still, translating such findings into actual care management will require further refinement, implementation, and evaluation.
Copyright © 2017 Elsevier Inc. All rights reserved.
Objective - Understanding how to identify the social determinants of health from electronic health records (EHRs) could provide important insights to understand health or disease outcomes. We developed a methodology to capture 2 rare and severe social determinants of health, homelessness and adverse childhood experiences (ACEs), from a large EHR repository.
Materials and Methods - We first constructed lexicons to capture homelessness and ACE phenotypic profiles. We employed word2vec and lexical associations to mine homelessness-related words. Next, using relevance feedback, we refined the 2 profiles with iterative searches over 100 million notes from the Vanderbilt EHR. Seven assessors manually reviewed the top-ranked results of 2544 patient visits relevant for homelessness and 1000 patients relevant for ACE.
Results - word2vec yielded better performance (area under the precision-recall curve [AUPRC] of 0.94) than lexical associations (AUPRC = 0.83) for extracting homelessness-related words. A comparative study of searches for the 2 phenotypes revealed a higher performance achieved for homelessness (AUPRC = 0.95) than ACE (AUPRC = 0.79). A temporal analysis of the homeless population showed that the majority experienced chronic homelessness. Most ACE patients suffered sexual (70%) and/or physical (50.6%) abuse, with the top-ranked abuser keywords being "father" (21.8%) and "mother" (15.4%). Top prevalent associated conditions for homeless patients were lack of housing (62.8%) and tobacco use disorder (61.5%), while for ACE patients it was mental disorders (36.6%-47.6%).
Conclusion - We provide an efficient solution for mining homelessness and ACE information from EHRs, which can facilitate large clinical and genetic studies of these social determinants of health.
© The Author 2017. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: firstname.lastname@example.org
We hypothesize that the relative mitochondria copy number (MTCN) can be estimated by comparing the abundance of mitochondrial DNA to nuclear DNA reads using high throughput sequencing data. To test this hypothesis, we examined relative MTCN across 13 breast cancer cell lines using the RT-PCR based NovaQUANT Human Mitochondrial to Nuclear DNA Ratio Kit as the gold standard. Six distinct computational approaches were used to estimate the relative MTCN in order to compare to the RT-PCR measurements. The results demonstrate that relative MTCN correlates well with the RT-PCR measurements using exome sequencing data, but not RNA-seq data. Through analysis of copy number variants (CNVs) in The Cancer Genome Atlas, we show that the two nuclear genes used in the NovaQUANT assay to represent the nuclear genome often experience CNVs in tumor cells, questioning the accuracy of this gold-standard method when it is applied to tumor cells.
Copyright © 2017 Elsevier Inc. All rights reserved.
Socioeconomic status (SES) is a fundamental contributor to health, and a key factor underlying racial disparities in disease. However, SES data are rarely included in genetic studies due in part to the difficultly of collecting these data when studies were not originally designed for that purpose. The emergence of large clinic-based biobanks linked to electronic health records (EHRs) provides research access to large patient populations with longitudinal phenotype data captured in structured fields as billing codes, procedure codes, and prescriptions. SES data however, are often not explicitly recorded in structured fields, but rather recorded in the free text of clinical notes and communications. The content and completeness of these data vary widely by practitioner. To enable gene-environment studies that consider SES as an exposure, we sought to extract SES variables from racial/ethnic minority adult patients (n=9,977) in BioVU, the Vanderbilt University Medical Center biorepository linked to de-identified EHRs. We developed several measures of SES using information available within the de-identified EHR, including broad categories of occupation, education, insurance status, and homelessness. Two hundred patients were randomly selected for manual review to develop a set of seven algorithms for extracting SES information from de-identified EHRs. The algorithms consist of 15 categories of information, with 830 unique search terms. SES data extracted from manual review of 50 randomly selected records were compared to data produced by the algorithm, resulting in positive predictive values of 80.0% (education), 85.4% (occupation), 87.5% (unemployment), 63.6% (retirement), 23.1% (uninsured), 81.8% (Medicaid), and 33.3% (homelessness), suggesting some categories of SES data are easier to extract in this EHR than others. The SES data extraction approach developed here will enable future EHR-based genetic studies to integrate SES information into statistical analyses. Ultimately, incorporation of measures of SES into genetic studies will help elucidate the impact of the social environment on disease risk and outcomes.
Objective - The goal of this investigation was to determine whether automated approaches can learn patient-oriented care teams via utilization of an electronic medical record (EMR) system.
Materials and Methods - To perform this investigation, we designed a data-mining framework that relies on a combination of latent topic modeling and network analysis to infer patterns of collaborative teams. We applied the framework to the EMR utilization records of over 10 000 employees and 17 000 inpatients at a large academic medical center during a 4-month window in 2010. Next, we conducted an extrinsic evaluation of the patterns to determine the plausibility of the inferred care teams via surveys with knowledgeable experts. Finally, we conducted an intrinsic evaluation to contextualize each team in terms of collaboration strength (via a cluster coefficient) and clinical credibility (via associations between teams and patient comorbidities).
Results - The framework discovered 34 collaborative care teams, 27 (79.4%) of which were confirmed as administratively plausible. Of those, 26 teams depicted strong collaborations, with a cluster coefficient > 0.5. There were 119 diagnostic conditions associated with 34 care teams. Additionally, to provide clarity on how the survey respondents arrived at their determinations, we worked with several oncologists to develop an illustrative example of how a certain team functions in cancer care.
Discussion - Inferred collaborative teams are plausible; translating such patterns into optimized collaborative care will require administrative review and integration with management practices.
Conclusions - EMR utilization records can be mined for collaborative care patterns in large complex medical centers.
© The Author 2016. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: email@example.com
BACKGROUND - Clinical text contains valuable information but must be de-identified before it can be used for secondary purposes. Accurate annotation of personally identifiable information (PII) is essential to the development of automated de-identification systems and to manual redaction of PII. Yet the accuracy of annotations may vary considerably across individual annotators and annotation is costly. As such, the marginal benefit of incorporating additional annotators has not been well characterized.
OBJECTIVES - This study models the costs and benefits of incorporating increasing numbers of independent human annotators to identify the instances of PII in a corpus. We used a corpus with gold standard annotations to evaluate the performance of teams of annotators of increasing size.
METHODS - Four annotators independently identified PII in a 100-document corpus consisting of randomly selected clinical notes from Family Practice clinics in a large integrated health care system. These annotations were pooled and validated to generate a gold standard corpus for evaluation.
RESULTS - Recall rates for all PII types ranged from 0.90 to 0.98 for individual annotators to 0.998 to 1.0 for teams of three, when meas-ured against the gold standard. Median cost per PII instance discovered during corpus annotation ranged from $ 0.71 for an individual annotator to $ 377 for annotations discovered only by a fourth annotator.
CONCLUSIONS - Incorporating a second annotator into a PII annotation process reduces unredacted PII and improves the quality of annotations to 0.99 recall, yielding clear benefit at reasonable cost; the cost advantages of annotation teams larger than two diminish rapidly.
OBJECTIVE - Health care generated data have become an important source for clinical and genomic research. Often, investigators create and iteratively refine phenotype algorithms to achieve high positive predictive values (PPVs) or sensitivity, thereby identifying valid cases and controls. These algorithms achieve the greatest utility when validated and shared by multiple health care systems.Materials and Methods We report the current status and impact of the Phenotype KnowledgeBase (PheKB, http://phekb.org), an online environment supporting the workflow of building, sharing, and validating electronic phenotype algorithms. We analyze the most frequent components used in algorithms and their performance at authoring institutions and secondary implementation sites.
RESULTS - As of June 2015, PheKB contained 30 finalized phenotype algorithms and 62 algorithms in development spanning a range of traits and diseases. Phenotypes have had over 3500 unique views in a 6-month period and have been reused by other institutions. International Classification of Disease codes were the most frequently used component, followed by medications and natural language processing. Among algorithms with published performance data, the median PPV was nearly identical when evaluated at the authoring institutions (n = 44; case 96.0%, control 100%) compared to implementation sites (n = 40; case 97.5%, control 100%).
DISCUSSION - These results demonstrate that a broad range of algorithms to mine electronic health record data from different health systems can be developed with high PPV, and algorithms developed at one site are generally transportable to others.
CONCLUSION - By providing a central repository, PheKB enables improved development, transportability, and validity of algorithms for research-grade phenotypes using health care generated data.
© The Author 2016. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: firstname.lastname@example.org.
The Institute of Medicine (IOM) recommends that health care providers collect data on gender identity. If these data are to be useful, they should utilize terms that characterize gender identity in a manner that is 1) sensitive to transgender and gender non-binary individuals (trans* people) and 2) semantically structured to render associated data meaningful to the health care professionals. We developed a set of tools and approaches for analyzing Twitter data as a basis for generating hypotheses on language used to identify gender and discuss gender-related issues across regions and population groups. We offer sample hypotheses regarding regional variations in the usage of certain terms such as 'genderqueer', 'genderfluid', and 'neutrois' and their usefulness as terms on intake forms. While these hypotheses cannot be directly validated with Twitter data alone, our data and tools help to formulate testable hypotheses and design future studies regarding the adequacy of gender identification terms on intake forms.
Complexity in clinical workflows can lead to inefficiency in making diagnoses, ineffectiveness of treatment plans and uninformed management of healthcare organizations (HCOs). Traditional strategies to manage workflow complexity are based on measuring the gaps between workflows defined by HCO administrators and the actual processes followed by staff in the clinic. However, existing methods tend to neglect the influences of EMR systems on the utilization of workflows, which could be leveraged to optimize workflows facilitated through the EMR. In this paper, we introduce a framework to infer clinical workflows through the utilization of an EMR and show how such workflows roughly partition into four types according to their efficiency. Our framework infers workflows at several levels of granularity through data mining technologies. We study four months of EMR event logs from a large medical center, including 16,569 inpatient stays, and illustrate that over approximately 95% of workflows are efficient and that 80% of patients are on such workflows. At the same time, we show that the remaining 5% of workflows may be inefficient due to a variety of factors, such as complex patients.