The publication data currently available has been vetted by Vanderbilt faculty, staff, administrators and trainees. The data itself is retrieved directly from NCBI's PubMed and is automatically updated on a weekly basis to ensure accuracy and completeness.
If you have any questions or comments, please contact us.
OBJECTIVE - To utilize electronic health records (EHRs) to study SLE, algorithms are needed to accurately identify these patients. We used machine learning to generate data-driven SLE EHR algorithms and assessed performance of existing rule-based algorithms.
METHODS - We randomly selected subjects with ≥ 1 SLE ICD-9/10 codes from our EHR and identified gold standard definite and probable SLE cases by chart review, based on 1997 ACR or 2012 SLICC Classification Criteria. From a training set, we extracted coded and narrative concepts using natural language processing and generated algorithms using penalized logistic regression to classify definite or definite/probable SLE. We assessed predictive characteristics in internal and external cohort validations. We also tested performance characteristics of published rule-based algorithms with pre-specified permutations of ICD-9 codes, laboratory tests and medications in our EHR.
RESULTS - At a specificity of 97%, our machine learning coded algorithm for definite SLE had 90% positive predictive value (PPV) and 64% sensitivity and for definite/probable SLE, 92% PPV and 47% sensitivity. In the external validation, at 97% specificity, the definite/probable algorithm had 94% PPV and 60% sensitivity. Adding NLP concepts did not improve performance metrics. The PPVs of published rule-based algorithms ranged from 45-79% in our EHR.
CONCLUSION - Our machine learning SLE algorithms performed well in internal and external validation. Rule-based SLE algorithms did not transport as well to our EHR. Unique EHR characteristics, clinical practices and research goals regarding the desired sensitivity and specificity of the case definition must be considered when applying algorithms to identify SLE patients.
Copyright © 2019 Elsevier Inc. All rights reserved.
BACKGROUND - There are few and conflicting data on the role of cytochrome P450 2D6 (CYP2D6) polymorphisms in relation to risperidone adverse events (AEs) in children. This study assessed the association between CYP2D6 metabolizer status and risk for risperidone AEs in children.
METHODS - Children ≤18 years with at least 4 weeks of risperidone exposure were identified using BioVU, a de-identified DNA biobank linked to electronic health record data. The primary outcome of this study was AEs. After DNA sequencing, individuals were classified as CYP2D6 poor, intermediate, normal, or ultrarapid CYP2D6 metabolizers.
RESULTS - For analysis, the 257 individuals were grouped as poor/intermediate metabolizers (n = 33, 13%) and normal/ultrarapid metabolizers (n = 224, 87%). AEs were more common in poor/intermediate vs. normal/ultrarapid metabolizers (15/33, 46% vs. 61/224, 27%, P = 0.04). In multivariate analysis adjusting for age, sex, race, and initial dose, poor/intermediate metabolizers had increased AE risk (adjusted odds ratio 2.4, 95% confidence interval 1.1-5.1, P = 0.03).
CONCLUSION - Children with CYP2D6 poor or intermediate metabolizer phenotypes are at greater risk for risperidone AEs. Pre-prescription genotyping could identify this high-risk subset for an alternate therapy, risperidone dose reduction, and/or increased monitoring for AEs.
In systemic lupus erythematosus (SLE), dsDNA antibodies are associated with renal disease. Less is known about comorbidities in patients without dsDNA or other autoantibodies. Using an electronic health record (EHR) SLE cohort, we employed a phenome-wide association study (PheWAS) that scans across billing codes to compare comorbidities in SLE patients with and without autoantibodies. We used our validated algorithm to identify SLE subjects. Autoantibody status was defined as ever positive for dsDNA, RNP, Smith, SSA and SSB. PheWAS was performed in antibody positive vs. negative SLE patients adjusting for age and race and using a false discovery rate of 0.05. We identified 1097 SLE subjects. In the PheWAS of dsDNA positive vs. negative subjects, dsDNA positive subjects were more likely to have nephritis ( p = 2.33 × 10) and renal failure ( p = 1.85 × 10). After adjusting for sex, race, age and other autoantibodies, dsDNA was independently associated with nephritis and chronic kidney disease. Those patients negative for dsDNA, RNP, SSA and SSB negative subjects were all more likely to have billing codes for sleep, pain and mood disorders. PheWAS uncovered a hierarchy within SLE-specific autoantibodies with dsDNA having the greatest impact on major organ involvement.
Defining the full spectrum of human disease associated with a biomarker is necessary to advance the biomarker into clinical practice. We hypothesize that associating biomarker measurements with electronic health record (EHR) populations based on shared genetic architectures would establish the clinical epidemiology of the biomarker. We use Bayesian sparse linear mixed modeling to calculate SNP weightings for 53 biomarkers from the Atherosclerosis Risk in Communities study. We use the SNP weightings to computed predicted biomarker values in an EHR population and test associations with 1139 diagnoses. Here we report 116 associations meeting a Bonferroni level of significance. A false discovery rate (FDR)-based significance threshold reveals more known and undescribed associations across a broad range of biomarkers, including biometric measures, plasma proteins and metabolites, functional assays, and behaviors. We confirm an inverse association between LDL-cholesterol level and septicemia risk in an independent epidemiological cohort. This approach efficiently discovers biomarker-disease associations.
BACKGROUND - Observations from statin clinical trials and from Mendelian randomization studies suggest that low low-density lipoprotein cholesterol (LDL-C) concentrations may be associated with increased risk of type 2 diabetes mellitus (T2DM). Despite the findings from statin clinical trials and genetic studies, there is little direct evidence implicating low LDL-C concentrations in increased risk of T2DM.
METHODS AND FINDINGS - We used de-identified electronic health records (EHRs) at Vanderbilt University Medical Center to compare the risk of T2DM in a cross-sectional study among individuals with very low (≤60 mg/dl, N = 8,943) and normal (90-130 mg/dl, N = 71,343) LDL-C levels calculated using the Friedewald formula. LDL-C levels associated with statin use, hospitalization, or a serum albumin level < 3 g/dl were excluded. We used a 2-phase approach: in 1/3 of the sample (discovery) we used T2DM phenome-wide association study codes (phecodes) to identify cases and controls, and in the remaining 2/3 (validation) we identified T2DM cases and controls using a validated algorithm. The analysis plan for the validation phase was constructed at the time of the design of that component of the study. The prevalence of T2DM in the very low and normal LDL-C groups was compared using logistic regression with adjustment for age, race, sex, body mass index (BMI), high-density lipoprotein cholesterol, triglycerides, and duration of care. Secondary analyses included prespecified stratification by sex, race, BMI, and LDL-C level. In the discovery cohort, phecodes related to T2DM were significantly more frequent in the very low LDL-C group. In the validation cohort (N = 33,039 after applying the T2DM algorithm to identify cases and controls), the risk of T2DM was increased in the very low compared to normal LDL-C group (odds ratio [OR] 2.06, 95% CI 1.80-2.37; P < 2 × 10-16). The findings remained significant in sensitivity analyses. The association between low LDL-C levels and T2DM was significant in males (OR 2.43, 95% CI 2.00-2.95; P < 2 × 10-16) and females (OR 1.74, 95% CI 1.42-2.12; P = 6.88 × 10-8); in normal weight (OR 2.18, 95% CI 1.59-2.98; P = 1.1× 10-6), overweight (OR 2.17, 95% CI 1.65-2.83; P = 1.73× 10-8), and obese (OR 2.00, 95% CI 1.65-2.41; P = 8 × 10-13) categories; and in individuals with LDL-C < 40 mg/dl (OR 2.31, 95% CI 1.71-3.10; P = 3.01× 10-8) and LDL-C 40-60 mg/dl (OR 1.99, 95% CI 1.71-2.32; P < 2.0× 10-16). The association was significant in individuals of European ancestry (OR 2.67, 95% CI 2.25-3.17; P < 2 × 10-16) but not in those of African ancestry (OR 1.09, 95% CI 0.81-1.46; P = 0.56). A limitation was that we only compared groups with very low and normal LDL-C levels; also, since this was not an inception cohort, we cannot exclude the possibility of reverse causation.
CONCLUSIONS - Very low LDL-C concentrations occurring in the absence of statin treatment were significantly associated with T2DM risk in a large EHR population; this increased risk was present in both sexes and all BMI categories, and in individuals of European ancestry but not of African ancestry. Longitudinal cohort studies to assess the relationship between very low LDL-C levels not associated with lipid-lowering therapy and risk of developing T2DM will be important.
Electronic health record (EHR) algorithms for defining patient cohorts are commonly shared as free-text descriptions that require human intervention both to interpret and implement. We developed the Phenotype Execution and Modeling Architecture (PhEMA, http://projectphema.org) to author and execute standardized computable phenotype algorithms. With PhEMA, we converted an algorithm for benign prostatic hyperplasia, developed for the electronic Medical Records and Genomics network (eMERGE), into a standards-based computable format. Eight sites (7 within eMERGE) received the computable algorithm, and 6 successfully executed it against local data warehouses and/or i2b2 instances. Blinded random chart review of cases selected by the computable algorithm shows PPV ≥90%, and 3 out of 5 sites had >90% overlap of selected cases when comparing the computable algorithm to their original eMERGE implementation. This case study demonstrates potential use of PhEMA computable representations to automate phenotyping across different EHR systems, but also highlights some ongoing challenges.
Clinical vocabularies allow for standard representation of clinical concepts, and can also contain knowledge structures, such as hierarchy, that facilitate the creation of maintainable and accurate clinical decision support (CDS). A key architectural feature of clinical hierarchies is how they handle parent-child relationships - specifically whether hierarchies are strict hierarchies (allowing a single parent per concept) or polyhierarchies (allowing multiple parents per concept). These structures handle subsumption relationships (ie, ancestor and descendant relationships) differently. In this paper, we describe three real-world malfunctions of clinical decision support related to incorrect assumptions about subsumption checking for β-blocker, specifically carvedilol, a non-selective β-blocker that also has α-blocker activity. We recommend that 1) CDS implementers should learn about the limitations of terminologies, hierarchies, and classification, 2) CDS implementers should thoroughly test CDS, with a focus on special or unusual cases, 3) CDS implementers should monitor feedback from users, and 4) electronic health record (EHR) and clinical content developers should offer and support polyhierarchical clinical terminologies, especially for medications.
Motivation - Phenome-wide association studies (PheWAS) have been used to discover many genotype-phenotype relationships and have the potential to identify therapeutic and adverse drug outcomes using longitudinal data within electronic health records (EHRs). However, the statistical methods for PheWAS applied to longitudinal EHR medication data have not been established.
Results - In this study, we developed methods to address two challenges faced with reuse of EHR for this purpose: confounding by indication, and low exposure and event rates. We used Monte Carlo simulation to assess propensity score (PS) methods, focusing on two of the most commonly used methods, PS matching and PS adjustment, to address confounding by indication. We also compared two logistic regression approaches (the default of Wald versus Firth's penalized maximum likelihood, PML) to address complete separation due to sparse data with low exposure and event rates. PS adjustment resulted in greater power than PS matching, while controlling Type I error at 0.05. The PML method provided reasonable P-values, even in cases with complete separation, with well controlled Type I error rates. Using PS adjustment and the PML method, we identify novel latent drug effects in pediatric patients exposed to two common antibiotic drugs, ampicillin and gentamicin.
Availability and implementation - R packages PheWAS and EHR are available at https://github.com/PheWAS/PheWAS and at CRAN (https://www.r-project.org/), respectively. The R script for data processing and the main analysis is available at https://github.com/choileena/EHR.
Supplementary information - Supplementary data are available at Bioinformatics online.
The eMERGE Network is establishing methods for electronic transmittal of patient genetic test results from laboratories to healthcare providers across organizational boundaries. We surveyed the capabilities and needs of different network participants, established a common transfer format, and implemented transfer mechanisms based on this format. The interfaces we created are examples of the connectivity that must be instantiated before electronic genetic and genomic clinical decision support can be effectively built at the point of care. This work serves as a case example for both standards bodies and other organizations working to build the infrastructure required to provide better electronic clinical decision support for clinicians.