The publication data currently available has been vetted by Vanderbilt faculty, staff, administrators and trainees. The data itself is retrieved directly from NCBI's PubMed and is automatically updated on a weekly basis to ensure accuracy and completeness.
If you have any questions or comments, please contact us.
OBJECTIVE - To quantify and contextualize the risk for coronavirus disease 2019 (COVID-19)-related hospitalization and illness severity in type 1 diabetes.
RESEARCH DESIGN AND METHODS - We conducted a prospective cohort study to identify case subjects with COVID-19 across a regional health care network of 137 service locations. Using an electronic health record query, chart review, and patient contact, we identified clinical factors influencing illness severity.
RESULTS - We identified COVID-19 in 6,138, 40, and 273 patients without diabetes and with type 1 and type 2 diabetes, respectively. Compared with not having diabetes, people with type 1 diabetes had adjusted odds ratios of 3.90 (95% CI 1.75-8.69) for hospitalization and 3.35 (95% CI 1.53-7.33) for greater illness severity, which was similar to risk in type 2 diabetes. Among patients with type 1 diabetes, glycosylated hemoglobin (HbA), hypertension, race, recent diabetic ketoacidosis, health insurance status, and less diabetes technology use were significantly associated with illness severity.
CONCLUSIONS - Diabetes status, both type 1 and type 2, independently increases the adverse impacts of COVID-19. Potentially modifiable factors (e.g., HbA) had significant but modest impact compared with comparatively static factors (e.g., race and insurance) in type 1 diabetes, indicating an urgent and continued need to mitigate severe acute respiratory syndrome coronavirus 2 infection risk in this community.
© 2020 by the American Diabetes Association.
Postmarketing population pharmacokinetic (PK) and pharmacodynamic (PD) studies can be useful to capture patient characteristics affecting PK or PD in real-world settings. These studies require longitudinally measured dose, outcomes, and covariates in large numbers of patients; however, prospective data collection is cost-prohibitive. Electronic health records (EHRs) can be an excellent source for such data, but there are challenges, including accurate ascertainment of drug dose. We developed a standardized system to prepare datasets from EHRs for population PK/PD studies. Our system handles a variety of tasks involving data extraction from clinical text using a natural language processing algorithm, data processing, and data building. Applying this system, we performed a fentanyl population PK analysis, resulting in comparable parameter estimates to a prior study. This new system makes the EHR data extraction and preparation process more efficient and accurate and provides a powerful tool to facilitate postmarketing population PK/PD studies using information available in EHRs.
© 2020 The Authors Clinical Pharmacology & Therapeutics © 2020 American Society for Clinical Pharmacology and Therapeutics.
BACKGROUND - Systemic sclerosis (SSc) is a rare disease with studies limited by small sample sizes. Electronic health records (EHRs) represent a powerful tool to study patients with rare diseases such as SSc, but validated methods are needed. We developed and validated EHR-based algorithms that incorporate billing codes and clinical data to identify SSc patients in the EHR.
METHODS - We used a de-identified EHR with over 3 million subjects and identified 1899 potential SSc subjects with at least 1 count of the SSc ICD-9 (710.1) or ICD-10-CM (M34*) codes. We randomly selected 200 as a training set for chart review. A subject was a case if diagnosed with SSc by a rheumatologist, dermatologist, or pulmonologist. We selected the following algorithm components based on clinical knowledge and available data: SSc ICD-9 and ICD-10-CM codes, positive antinuclear antibody (ANA) (titer ≥ 1:80), and a keyword of Raynaud's phenomenon (RP). We performed both rule-based and machine learning techniques for algorithm development. Positive predictive values (PPVs), sensitivities, and F-scores (which account for PPVs and sensitivities) were calculated for the algorithms.
RESULTS - PPVs were low for algorithms using only 1 count of the SSc ICD-9 code. As code counts increased, the PPVs increased. PPVs were higher for algorithms using ICD-10-CM codes versus the ICD-9 code. Adding a positive ANA and RP keyword increased the PPVs of algorithms only using ICD billing codes. Algorithms using ≥ 3 or ≥ 4 counts of the SSc ICD-9 or ICD-10-CM codes and ANA positivity had the highest PPV at 100% but a low sensitivity at 50%. The algorithm with the highest F-score of 91% was ≥ 4 counts of the ICD-9 or ICD-10-CM codes with an internally validated PPV of 90%. A machine learning method using random forests yielded an algorithm with a PPV of 84%, sensitivity of 92%, and F-score of 88%. The most important feature was RP keyword.
CONCLUSIONS - Algorithms using only ICD-9 codes did not perform well to identify SSc patients. The highest performing algorithms incorporated clinical data with billing codes. EHR-based algorithms can identify SSc patients across a healthcare system, enabling researchers to examine important outcomes.
OBJECTIVE - The Phenotype Risk Score (PheRS) is a method to detect Mendelian disease patterns using phenotypes from the electronic health record (EHR). We compared the performance of different approaches mapping EHR phenotypes to Mendelian disease features.
MATERIALS AND METHODS - PheRS utilizes Mendelian diseases descriptions annotated with Human Phenotype Ontology (HPO) terms. In previous work, we presented a map linking phecodes (based on International Classification of Diseases [ICD]-Ninth Revision) to HPO terms. For this study, we integrated ICD-Tenth Revision codes and lab data. We also created a new map between HPO terms using customized groupings of ICD codes. We compared the performance with cases and controls for 16 Mendelian diseases using 2.5 million de-identified medical records.
RESULTS - PheRS effectively distinguished cases from controls for all 15 positive controls and all approaches tested (P < 4 × 1016). Adding lab data led to a statistically significant improvement for 4 of 14 diseases. The custom ICD groupings improved specificity, leading to an average 8% increase for precision at 100 (-2% to 22%). Eight of 10 adults with cystic fibrosis tested had PheRS in the 95th percentile prio to diagnosis.
DISCUSSION - Both phecodes and custom ICD groupings were able to detect differences between affected cases and controls at the population level. The ICD map showed better precision for the highest scoring individuals. Adding lab data improved performance at detecting population-level differences.
CONCLUSIONS - PheRS is a scalable method to study Mendelian disease at the population level using electronic health record data and can potentially be used to find patients with undiagnosed Mendelian disease.
© The Author(s) 2019. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For permissions, please email: firstname.lastname@example.org.
BACKGROUND - Cardiovascular disease is the leading cause of death in the United States. Consequently, individuals who are genetically predisposed for high risk of cardiovascular disease would benefit most from prevention and early intervention approaches. Among common health risk factors affecting adult populations, we evaluated 23 cardiovascular disease-related traits, including BMI, glucose levels and lipid profiling to determine their associations with low-frequency recurrent copy number variations (CNV) (population frequency < 5%).
RESULTS - We examined 10,619 unrelated subjects of European ancestry from the Electronic Medical Records and Genomics (eMERGE) Network who were genotyped with 657,366 markers genome-wide on the Illumina Infinium Quad 660 array. We performed CNV calling based on array marker intensity and evaluated data quality, ancestry stratification, and relatedness to ensure unbiased association discovery. Using a segment-based scoring approach, we assessed the association of all CNVs with each trait. In this large genome-wide analysis of low-frequency CNVs, we observed 11 novel genome-wide significant associations of low-frequency CNVs with major cardiovascular disease traits.
CONCLUSION - In one of the largest genome-wide studies for low-frequency recurrent CNVs, we identified 11 loci associated with cardiovascular disease and related traits at the genome-wide significance level that may serve as biomarkers for prevention and early intervention studies in subjects who are at elevated risk. Our study further supports the role of low-frequency recurrent CNVs in the pathogenesis of common complex disease traits.
Copyright © 2019. Published by Elsevier B.V.
Antimalarials (AMs) reduce disease activity and improve survival in patients with systemic lupus erythematosus (SLE), but studies have reported low AM prescribing frequencies. Using a real-world electronic health record cohort, we examined if patient or provider characteristics impacted AM prescribing. We identified 977 SLE cases, 94% of whom were ever prescribed an AM. Older patients and patients with SLE nephritis were less likely to be on AMs. Current age (odds ratio = 0.97, < 0.01) and nephritis (odds ratio = 0.16, < 0.01) were both significantly associated with ever AM use after adjustment for sex and race. Of the 244 SLE nephritis cases, only 63% were currently on AMs. SLE nephritis subjects who were currently prescribed AMs were more likely to be followed by a rheumatologist than a nephrologist and less likely to have undergone dialysis or renal transplant (both < 0.001). Non-current versus current SLE nephritis AM users had higher serum creatinine ( < 0.001), higher urine protein ( = 0.05), and lower hemoglobin levels ( < 0.01). As AMs reduce disease damage and improve survival in patients with SLE, our results demonstrate an opportunity to target future efforts to improve prescribing rates among multi-specialty providers.
BACKGROUND - Circulating biomarkers can facilitate diagnosis and risk stratification for complex conditions such as heart failure (HF). Newer molecular platforms can accelerate biomarker discovery, but they require significant resources for data and sample acquisition.
OBJECTIVES - The purpose of this study was to test a pragmatic biomarker discovery strategy integrating automated clinical biobanking with proteomics.
METHODS - Using the electronic health record, the authors identified patients with and without HF, retrieved their discarded plasma samples, and screened these specimens using a DNA aptamer-based proteomic platform (1,129 proteins). Candidate biomarkers were validated in 3 different prospective cohorts.
RESULTS - In an automated manner, plasma samples from 1,315 patients (31% with HF) were collected. Proteomic analysis of a 96-patient subset identified 9 candidate biomarkers (p < 4.42 × 10). Two proteins, angiopoietin-2 and thrombospondin-2, were associated with HF in 3 separate validation cohorts. In an emergency department-based registry of 852 dyspneic patients, the 2 biomarkers improved discrimination of acute HF compared with a clinical score (p < 0.0001) or clinical score plus B-type natriuretic peptide (p = 0.02). In a community-based cohort (n = 768), both biomarkers predicted incident HF independent of traditional risk factors and N-terminal pro-B-type natriuretic peptide (hazard ratio per SD increment: 1.35 [95% confidence interval: 1.14 to 1.61; p = 0.0007] for angiopoietin-2, and 1.37 [95% confidence interval: 1.06 to 1.79; p = 0.02] for thrombospondin-2). Among 30 advanced HF patients, concentrations of both biomarkers declined (80% to 84%) following cardiac transplant (p < 0.001 for both).
CONCLUSIONS - A novel strategy integrating electronic health records, discarded clinical specimens, and proteomics identified 2 biomarkers that robustly predict HF across diverse clinical settings. This approach could accelerate biomarker discovery for many diseases.
Copyright © 2019 American College of Cardiology Foundation. Published by Elsevier Inc. All rights reserved.
Benign prostatic hyperplasia (BPH) results in a significant public health burden due to the morbidity caused by the disease and many of the available remedies. As much as 70% of men over 70 will develop BPH. Few studies have been conducted to discover the genetic determinants of BPH risk. Understanding the biological basis for this condition may provide necessary insight for development of novel pharmaceutical therapies or risk prediction. We have evaluated SNP-based heritability of BPH in two cohorts and conducted a genome-wide association study (GWAS) of BPH risk using 2,656 cases and 7,763 controls identified from the Electronic Medical Records and Genomics (eMERGE) network. SNP-based heritability estimates suggest that roughly 60% of the phenotypic variation in BPH is accounted for by genetic factors. We used logistic regression to model BPH risk as a function of principal components of ancestry, age, and imputed genotype data, with meta-analysis performed using METAL. The top result was on chromosome 22 in SYN3 at rs2710383 (p-value = 4.6 × 10; Odds Ratio = 0.69, 95% confidence interval = 0.55-0.83). Other suggestive signals were near genes GLGC, UNCA13, SORCS1 and between BTBD3 and SPTLC3. We also evaluated genetically-predicted gene expression in prostate tissue. The most significant result was with increasing predicted expression of ETV4 (chr17; p-value = 0.0015). Overexpression of this gene has been associated with poor prognosis in prostate cancer. In conclusion, although there were no genome-wide significant variants identified for BPH susceptibility, we present evidence supporting the heritability of this phenotype, have identified suggestive signals, and evaluated the association between BPH and genetically-predicted gene expression in prostate.
Although the use of model systems for studying the mechanism of mutations that have a large effect is common, we highlight here the ways that zebrafish-model-system studies of a gene, GRIK5, that contributes to the polygenic liability to develop eye diseases have helped to illuminate a mechanism that implicates vascular biology in eye disease. A gene-expression prediction derived from a reference transcriptome panel applied to BioVU, a large electronic health record (EHR)-linked biobank at Vanderbilt University Medical Center, implicated reduced GRIK5 expression in diverse eye diseases. We tested the function of GRIK5 by depletion of its ortholog in zebrafish, and we observed reduced blood vessel numbers and integrity in the eye and increased vascular permeability. Analyses of EHRs in >2.6 million Vanderbilt subjects revealed significant comorbidity of eye and vascular diseases (relative risks 2-15); this comorbidity was confirmed in 150 million individuals from a large insurance claims dataset. Subsequent studies in >60,000 genotyped BioVU participants confirmed the association of reduced genetically predicted expression of GRIK5 with comorbid vascular and eye diseases. Our studies pioneer an approach that allows a rapid iteration of the discovery of gene-phenotype relationships to the primary genetic mechanism contributing to the pathophysiology of human disease. Our findings also add dimension to the understanding of the biology driven by glutamate receptors such as GRIK5 (also referred to as GLUK5 in protein form) and to mechanisms contributing to human eye diseases.
Copyright © 2019 American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.
OBJECTIVE - To utilize electronic health records (EHRs) to study SLE, algorithms are needed to accurately identify these patients. We used machine learning to generate data-driven SLE EHR algorithms and assessed performance of existing rule-based algorithms.
METHODS - We randomly selected subjects with ≥ 1 SLE ICD-9/10 codes from our EHR and identified gold standard definite and probable SLE cases by chart review, based on 1997 ACR or 2012 SLICC Classification Criteria. From a training set, we extracted coded and narrative concepts using natural language processing and generated algorithms using penalized logistic regression to classify definite or definite/probable SLE. We assessed predictive characteristics in internal and external cohort validations. We also tested performance characteristics of published rule-based algorithms with pre-specified permutations of ICD-9 codes, laboratory tests and medications in our EHR.
RESULTS - At a specificity of 97%, our machine learning coded algorithm for definite SLE had 90% positive predictive value (PPV) and 64% sensitivity and for definite/probable SLE, 92% PPV and 47% sensitivity. In the external validation, at 97% specificity, the definite/probable algorithm had 94% PPV and 60% sensitivity. Adding NLP concepts did not improve performance metrics. The PPVs of published rule-based algorithms ranged from 45-79% in our EHR.
CONCLUSION - Our machine learning SLE algorithms performed well in internal and external validation. Rule-based SLE algorithms did not transport as well to our EHR. Unique EHR characteristics, clinical practices and research goals regarding the desired sensitivity and specificity of the case definition must be considered when applying algorithms to identify SLE patients.
Copyright © 2019 Elsevier Inc. All rights reserved.