Other search tools

About this data

The publication data currently available has been vetted by Vanderbilt faculty, staff, administrators and trainees. The data itself is retrieved directly from NCBI's PubMed and is automatically updated on a weekly basis to ensure accuracy and completeness.

If you have any questions or comments, please contact us.

Results: 1 to 10 of 32

Publication Record

Connections

Rule-based and machine learning algorithms identify patients with systemic sclerosis accurately in the electronic health record.
Jamian L, Wheless L, Crofford LJ, Barnado A
(2019) Arthritis Res Ther 21: 305
MeSH Terms: Adult, Aged, Aged, 80 and over, Algorithms, Databases, Factual, Electronic Health Records, Female, Humans, International Classification of Diseases, Machine Learning, Male, Middle Aged, Reproducibility of Results, Scleroderma, Systemic, Sensitivity and Specificity
Show Abstract · Added March 25, 2020
BACKGROUND - Systemic sclerosis (SSc) is a rare disease with studies limited by small sample sizes. Electronic health records (EHRs) represent a powerful tool to study patients with rare diseases such as SSc, but validated methods are needed. We developed and validated EHR-based algorithms that incorporate billing codes and clinical data to identify SSc patients in the EHR.
METHODS - We used a de-identified EHR with over 3 million subjects and identified 1899 potential SSc subjects with at least 1 count of the SSc ICD-9 (710.1) or ICD-10-CM (M34*) codes. We randomly selected 200 as a training set for chart review. A subject was a case if diagnosed with SSc by a rheumatologist, dermatologist, or pulmonologist. We selected the following algorithm components based on clinical knowledge and available data: SSc ICD-9 and ICD-10-CM codes, positive antinuclear antibody (ANA) (titer ≥ 1:80), and a keyword of Raynaud's phenomenon (RP). We performed both rule-based and machine learning techniques for algorithm development. Positive predictive values (PPVs), sensitivities, and F-scores (which account for PPVs and sensitivities) were calculated for the algorithms.
RESULTS - PPVs were low for algorithms using only 1 count of the SSc ICD-9 code. As code counts increased, the PPVs increased. PPVs were higher for algorithms using ICD-10-CM codes versus the ICD-9 code. Adding a positive ANA and RP keyword increased the PPVs of algorithms only using ICD billing codes. Algorithms using ≥ 3 or ≥ 4 counts of the SSc ICD-9 or ICD-10-CM codes and ANA positivity had the highest PPV at 100% but a low sensitivity at 50%. The algorithm with the highest F-score of 91% was ≥ 4 counts of the ICD-9 or ICD-10-CM codes with an internally validated PPV of 90%. A machine learning method using random forests yielded an algorithm with a PPV of 84%, sensitivity of 92%, and F-score of 88%. The most important feature was RP keyword.
CONCLUSIONS - Algorithms using only ICD-9 codes did not perform well to identify SSc patients. The highest performing algorithms incorporated clinical data with billing codes. EHR-based algorithms can identify SSc patients across a healthcare system, enabling researchers to examine important outcomes.
0 Communities
1 Members
0 Resources
15 MeSH Terms
Validation of discharge diagnosis codes to identify serious infections among middle age and older adults.
Wiese AD, Griffin MR, Stein CM, Schaffner W, Greevy RA, Mitchel EF, Grijalva CG
(2018) BMJ Open 8: e020857
MeSH Terms: Aged, Aged, 80 and over, Algorithms, Clinical Coding, Female, Humans, Infections, International Classification of Diseases, Male, Medicaid, Medical Records, Middle Aged, Patient Discharge, Predictive Value of Tests, Reproducibility of Results, Retrospective Studies, Tennessee, United States
Show Abstract · Added July 27, 2018
OBJECTIVES - Hospitalisations for serious infections are common among middle age and older adults and frequently used as study outcomes. Yet, few studies have evaluated the performance of diagnosis codes to identify serious infections in this population. We sought to determine the positive predictive value (PPV) of diagnosis codes for identifying hospitalisations due to serious infections among middle age and older adults.
SETTING AND PARTICIPANTS - We identified hospitalisations for possible infection among adults >=50 years enrolled in the Tennessee Medicaid healthcare programme (2008-2012) using International Classifications of Diseases, Ninth Revision diagnosis codes for pneumonia, meningitis/encephalitis, bacteraemia/sepsis, cellulitis/soft-tissue infections, endocarditis, pyelonephritis and septic arthritis/osteomyelitis.
DESIGN - Medical records were systematically obtained from hospitals randomly selected from a stratified sampling framework based on geographical region and hospital discharge volume.
MEASURES - Two trained clinical reviewers used a standardised extraction form to abstract information from medical records. Predefined algorithms served as reference to adjudicate confirmed infection-specific hospitalisations. We calculated the PPV of diagnosis codes using confirmed hospitalisations as reference. Sensitivity analyses determined the robustness of the PPV to definitions that required radiological or microbiological confirmation. We also determined inter-rater reliability between reviewers.
RESULTS - The PPV of diagnosis codes for hospitalisations for infection (n=716) was 90.2% (95% CI 87.8% to 92.2%). The PPV was highest for pneumonia (96.5% (95% CI 93.9% to 98.0%)) and cellulitis (91.1% (95% CI 84.7% to 94.9%)), and lowest for meningitis/encephalitis (50.0% (95% CI 23.7% to 76.3%)). The adjudication reliability was excellent (92.7% agreement; first agreement coefficient: 0.91). The overall PPV was lower when requiring microbiological confirmation (45%) and when requiring radiological confirmation for pneumonia (79%).
CONCLUSIONS - Discharge diagnosis codes have a high PPV for identifying hospitalisations for common, serious infections among middle age and older adults. PPV estimates for rare infections were imprecise.
© Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
0 Communities
1 Members
0 Resources
18 MeSH Terms
Validation of an algorithm to identify heart failure hospitalisations in patients with diabetes within the veterans health administration.
Presley CA, Min JY, Chipman J, Greevy RA, Grijalva CG, Griffin MR, Roumie CL
(2018) BMJ Open 8: e020455
MeSH Terms: Adult, Aged, Algorithms, Diabetes Complications, Diabetes Mellitus, Diagnosis-Related Groups, Female, Heart Failure, Hospitalization, Humans, International Classification of Diseases, Male, Middle Aged, Predictive Value of Tests, Sensitivity and Specificity, Veterans
Show Abstract · Added July 27, 2018
OBJECTIVES - We aimed to validate an algorithm using both primary discharge diagnosis (International Classification of Diseases Ninth Revision (ICD-9)) and diagnosis-related group (DRG) codes to identify hospitalisations due to decompensated heart failure (HF) in a population of patients with diabetes within the Veterans Health Administration (VHA) system.
DESIGN - Validation study.
SETTING - Veterans Health Administration-Tennessee Valley Healthcare System PARTICIPANTS: We identified and reviewed a stratified, random sample of hospitalisations between 2001 and 2012 within a single VHA healthcare system of adults who received regular VHA care and were initiated on an antidiabetic medication between 2001 and 2008. We sampled 500 hospitalisations; 400 hospitalisations that fulfilled algorithm criteria, 100 that did not. Of these, 497 had adequate information for inclusion. The mean patient age was 66.1 years (SD 11.4). Majority of patients were male (98.8%); 75% were white and 20% were black.
PRIMARY AND SECONDARY OUTCOME MEASURES - To determine if a hospitalisation was due to HF, we performed chart abstraction using Framingham criteria as the referent standard. We calculated the positive predictive value (PPV), negative predictive value (NPV), sensitivity and specificity for the overall algorithm and each component (primary diagnosis code (ICD-9), DRG code or both).
RESULTS - The algorithm had a PPV of 89.7% (95% CI 86.8 to 92.7), NPV of 93.9% (89.1 to 98.6), sensitivity of 45.1% (25.1 to 65.1) and specificity of 99.4% (99.2 to 99.6). The PPV was highest for hospitalisations that fulfilled both the ICD-9 and DRG algorithm criteria (92.1% (89.1 to 95.1)) and lowest for hospitalisations that fulfilled only DRG algorithm criteria (62.5% (28.4 to 96.6)).
CONCLUSIONS - Our algorithm, which included primary discharge diagnosis and DRG codes, demonstrated excellent PPV for identification of hospitalisations due to decompensated HF among patients with diabetes in the VHA system.
© Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
0 Communities
1 Members
0 Resources
16 MeSH Terms
The time has come for dimensional personality disorder diagnosis.
Hopwood CJ, Kotov R, Krueger RF, Watson D, Widiger TA, Althoff RR, Ansell EB, Bach B, Michael Bagby R, Blais MA, Bornovalova MA, Chmielewski M, Cicero DC, Conway C, De Clercq B, De Fruyt F, Docherty AR, Eaton NR, Edens JF, Forbes MK, Forbush KT, Hengartner MP, Ivanova MY, Leising D, John Livesley W, Lukowitsky MR, Lynam DR, Markon KE, Miller JD, Morey LC, Mullins-Sweatt SN, Hans Ormel J, Patrick CJ, Pincus AL, Ruggero C, Samuel DB, Sellbom M, Slade T, Tackett JL, Thomas KM, Trull TJ, Vachon DD, Waldman ID, Waszczuk MA, Waugh MH, Wright AGC, Yalch MM, Zald DH, Zimmermann J
(2018) Personal Ment Health 12: 82-86
MeSH Terms: Humans, International Classification of Diseases, Personality Disorders
Added March 21, 2018
0 Communities
1 Members
0 Resources
3 MeSH Terms
Embracing Uncertainty in Reconstructing Early Animal Evolution.
King N, Rokas A
(2017) Curr Biol 27: R1081-R1088
MeSH Terms: Animals, Biological Evolution, Classification, Evolution, Molecular, Invertebrates, Phylogeny, Uncertainty
Show Abstract · Added March 21, 2018
The origin of animals, one of the major transitions in evolution, remains mysterious. Many key aspects of animal evolution can be reconstructed by comparing living species within a robust phylogenetic framework. However, uncertainty remains regarding the evolutionary relationships between two ancient animal lineages - sponges and ctenophores - and the remaining animal phyla. Comparative morphology and some phylogenomic analyses support the view that sponges represent the sister lineage to the rest of the animals, while other phylogenomic analyses support ctenophores, a phylum of carnivorous, gelatinous marine organisms, as the sister lineage. Here, we explore why different studies yield different answers and discuss the implications of the two alternative hypotheses for understanding the origin of animals. Reconstruction of ancient evolutionary radiations is devilishly difficult and will likely require broader sampling of sponge and ctenophore genomes, improved analytical strategies and critical analyses of the phylogenetic distribution and molecular mechanisms underlying apparently conserved traits. Rather than staking out positions in favor of the ctenophores-sister or the sponges-sister hypothesis, we submit that research programs aimed at understanding the biology of the first animals should instead embrace the uncertainty surrounding early animal evolution in their experimental designs.
Copyright © 2017 Elsevier Ltd. All rights reserved.
0 Communities
1 Members
0 Resources
7 MeSH Terms
A hierarchical causal taxonomy of psychopathology across the life span.
Lahey BB, Krueger RF, Rathouz PJ, Waldman ID, Zald DH
(2017) Psychol Bull 143: 142-186
MeSH Terms: Age Factors, Causality, Classification, Genetic Predisposition to Disease, Humans, Mental Disorders, Models, Psychological, Multivariate Analysis, Sex Factors, Social Environment
Show Abstract · Added April 6, 2017
We propose a taxonomy of psychopathology based on patterns of shared causal influences identified in a review of multivariate behavior genetic studies that distinguish genetic and environmental influences that are either common to multiple dimensions of psychopathology or unique to each dimension. At the phenotypic level, first-order dimensions are defined by correlations among symptoms; correlations among first-order dimensions similarly define higher-order domains (e.g., internalizing or externalizing psychopathology). We hypothesize that the robust phenotypic correlations among first-order dimensions reflect a . Some nonspecific etiologic factors increase risk for all first-order dimensions of psychopathology to varying degrees through a general factor of psychopathology. Other nonspecific etiologic factors increase risk only for all first-order dimensions within a more specific higher-order domain. Furthermore, each first-order dimension has its own unique causal influences. Genetic and environmental influences common to family members tend to be nonspecific, whereas environmental influences unique to each individual are more dimension-specific. We posit that these causal influences on psychopathology are moderated by sex and developmental processes. This causal taxonomy also provides a novel framework for understanding the of each first-order dimension: Different persons exhibiting similar symptoms may be influenced by different combinations of etiologic influences from each of the 3 levels of the etiologic hierarchy. Furthermore, we relate the proposed causal taxonomy to transdimensional psychobiological processes, which also impact the heterogeneity of each psychopathology dimension. This causal taxonomy implies the need for changes in strategies for studying the etiology, psychobiology, prevention, and treatment of psychopathology. (PsycINFO Database Record
(c) 2017 APA, all rights reserved).
0 Communities
1 Members
0 Resources
10 MeSH Terms
Developing Electronic Health Record Algorithms That Accurately Identify Patients With Systemic Lupus Erythematosus.
Barnado A, Casey C, Carroll RJ, Wheless L, Denny JC, Crofford LJ
(2017) Arthritis Care Res (Hoboken) 69: 687-693
MeSH Terms: Adult, Aged, Algorithms, Antibodies, Antinuclear, Antirheumatic Agents, Electronic Health Records, Female, Humans, International Classification of Diseases, Lupus Erythematosus, Systemic, Male, Middle Aged, Predictive Value of Tests, Reproducibility of Results, Sensitivity and Specificity
Show Abstract · Added March 14, 2018
OBJECTIVE - To study systemic lupus erythematosus (SLE) in the electronic health record (EHR), we must accurately identify patients with SLE. Our objective was to develop and validate novel EHR algorithms that use International Classification of Diseases, Ninth Revision (ICD-9), Clinical Modification codes, laboratory testing, and medications to identify SLE patients.
METHODS - We used Vanderbilt's Synthetic Derivative, a de-identified version of the EHR, with 2.5 million subjects. We selected all individuals with at least 1 SLE ICD-9 code (710.0), yielding 5,959 individuals. To create a training set, 200 subjects were randomly selected for chart review. A subject was defined as a case if diagnosed with SLE by a rheumatologist, nephrologist, or dermatologist. Positive predictive values (PPVs) and sensitivity were calculated for combinations of code counts of the SLE ICD-9 code, a positive antinuclear antibody (ANA), ever use of medications, and a keyword of "lupus" in the problem list. The algorithms with the highest PPV were each internally validated using a random set of 100 individuals from the remaining 5,759 subjects.
RESULTS - The algorithm with the highest PPV at 95% in the training set and 91% in the validation set was 3 or more counts of the SLE ICD-9 code, ANA positive (≥1:40), and ever use of both disease-modifying antirheumatic drugs and steroids, while excluding individuals with systemic sclerosis and dermatomyositis ICD-9 codes.
CONCLUSION - We developed and validated the first EHR algorithm that incorporates laboratory values and medications with the SLE ICD-9 code to identify patients with SLE accurately.
© 2016, American College of Rheumatology.
0 Communities
2 Members
0 Resources
15 MeSH Terms
PheKB: a catalog and workflow for creating electronic phenotype algorithms for transportability.
Kirby JC, Speltz P, Rasmussen LV, Basford M, Gottesman O, Peissig PL, Pacheco JA, Tromp G, Pathak J, Carrell DS, Ellis SB, Lingren T, Thompson WK, Savova G, Haines J, Roden DM, Harris PA, Denny JC
(2016) J Am Med Inform Assoc 23: 1046-1052
MeSH Terms: Algorithms, Data Mining, Electronic Health Records, Genomics, Humans, International Classification of Diseases, Knowledge Bases, Natural Language Processing, Phenotype
Show Abstract · Added March 14, 2018
OBJECTIVE - Health care generated data have become an important source for clinical and genomic research. Often, investigators create and iteratively refine phenotype algorithms to achieve high positive predictive values (PPVs) or sensitivity, thereby identifying valid cases and controls. These algorithms achieve the greatest utility when validated and shared by multiple health care systems.Materials and Methods We report the current status and impact of the Phenotype KnowledgeBase (PheKB, http://phekb.org), an online environment supporting the workflow of building, sharing, and validating electronic phenotype algorithms. We analyze the most frequent components used in algorithms and their performance at authoring institutions and secondary implementation sites.
RESULTS - As of June 2015, PheKB contained 30 finalized phenotype algorithms and 62 algorithms in development spanning a range of traits and diseases. Phenotypes have had over 3500 unique views in a 6-month period and have been reused by other institutions. International Classification of Disease codes were the most frequently used component, followed by medications and natural language processing. Among algorithms with published performance data, the median PPV was nearly identical when evaluated at the authoring institutions (n = 44; case 96.0%, control 100%) compared to implementation sites (n = 40; case 97.5%, control 100%).
DISCUSSION - These results demonstrate that a broad range of algorithms to mine electronic health record data from different health systems can be developed with high PPV, and algorithms developed at one site are generally transportable to others.
CONCLUSION - By providing a central repository, PheKB enables improved development, transportability, and validity of algorithms for research-grade phenotypes using health care generated data.
© The Author 2016. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
0 Communities
2 Members
0 Resources
9 MeSH Terms
A multi-institution evaluation of clinical profile anonymization.
Heatherly R, Rasmussen LV, Peissig PL, Pacheco JA, Harris P, Denny JC, Malin BA
(2016) J Am Med Inform Assoc 23: e131-7
MeSH Terms: Confidentiality, Data Anonymization, Electronic Health Records, Humans, Hypothyroidism, Information Dissemination, International Classification of Diseases, Organizational Case Studies
Show Abstract · Added March 14, 2018
BACKGROUND AND OBJECTIVE - There is an increasing desire to share de-identified electronic health records (EHRs) for secondary uses, but there are concerns that clinical terms can be exploited to compromise patient identities. Anonymization algorithms mitigate such threats while enabling novel discoveries, but their evaluation has been limited to single institutions. Here, we study how an existing clinical profile anonymization fares at multiple medical centers.
METHODS - We apply a state-of-the-artk-anonymization algorithm, withkset to the standard value 5, to the International Classification of Disease, ninth edition codes for patients in a hypothyroidism association study at three medical centers: Marshfield Clinic, Northwestern University, and Vanderbilt University. We assess utility when anonymizing at three population levels: all patients in 1) the EHR system; 2) the biorepository; and 3) a hypothyroidism study. We evaluate utility using 1) changes to the number included in the dataset, 2) number of codes included, and 3) regions generalization and suppression were required.
RESULTS - Our findings yield several notable results. First, we show that anonymizing in the context of the entire EHR yields a significantly greater quantity of data by reducing the amount of generalized regions from ∼15% to ∼0.5%. Second, ∼70% of codes that needed generalization only generalized two or three codes in the largest anonymization.
CONCLUSIONS - Sharing large volumes of clinical data in support of phenome-wide association studies is possible while safeguarding privacy to the underlying individuals.
© The Author 2015. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
0 Communities
2 Members
0 Resources
8 MeSH Terms
Combining billing codes, clinical notes, and medications from electronic health records provides superior phenotyping performance.
Wei WQ, Teixeira PL, Mo H, Cronin RM, Warner JL, Denny JC
(2016) J Am Med Inform Assoc 23: e20-7
MeSH Terms: Algorithms, Diagnosis, Electronic Health Records, Humans, International Classification of Diseases, Medical Records, Problem-Oriented, Phenotype, Predictive Value of Tests
Show Abstract · Added March 14, 2018
OBJECTIVE - To evaluate the phenotyping performance of three major electronic health record (EHR) components: International Classification of Disease (ICD) diagnosis codes, primary notes, and specific medications.
MATERIALS AND METHODS - We conducted the evaluation using de-identified Vanderbilt EHR data. We preselected ten diseases: atrial fibrillation, Alzheimer's disease, breast cancer, gout, human immunodeficiency virus infection, multiple sclerosis, Parkinson's disease, rheumatoid arthritis, and types 1 and 2 diabetes mellitus. For each disease, patients were classified into seven categories based on the presence of evidence in diagnosis codes, primary notes, and specific medications. Twenty-five patients per disease category (a total number of 175 patients for each disease, 1750 patients for all ten diseases) were randomly selected for manual chart review. Review results were used to estimate the positive predictive value (PPV), sensitivity, andF-score for each EHR component alone and in combination.
RESULTS - The PPVs of single components were inconsistent and inadequate for accurately phenotyping (0.06-0.71). Using two or more ICD codes improved the average PPV to 0.84. We observed a more stable and higher accuracy when using at least two components (mean ± standard deviation: 0.91 ± 0.08). Primary notes offered the best sensitivity (0.77). The sensitivity of ICD codes was 0.67. Again, two or more components provided a reasonably high and stable sensitivity (0.59 ± 0.16). Overall, the best performance (Fscore: 0.70 ± 0.12) was achieved by using two or more components. Although the overall performance of using ICD codes (0.67 ± 0.14) was only slightly lower than using two or more components, its PPV (0.71 ± 0.13) is substantially worse (0.91 ± 0.08).
CONCLUSION - Multiple EHR components provide a more consistent and higher performance than a single one for the selected phenotypes. We suggest considering multiple EHR components for future phenotyping design in order to obtain an ideal result.
© The Author 2015. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
0 Communities
1 Members
0 Resources
8 MeSH Terms