The publication data currently available has been vetted by Vanderbilt faculty, staff, administrators and trainees. The data itself is retrieved directly from NCBI's PubMed and is automatically updated on a weekly basis to ensure accuracy and completeness.
If you have any questions or comments, please contact us.
The genetic architecture of psychiatric disorders is characterized by a large number of small-effect variants located primarily in non-coding regions, suggesting that the underlying causal effects may influence disease risk by modulating gene expression. We provide comprehensive analyses using transcriptome data from an unprecedented collection of tissues to gain pathophysiological insights into the role of the brain, neuroendocrine factors (adrenal gland) and gastrointestinal systems (colon) in psychiatric disorders. In each tissue, we perform PrediXcan analysis and identify trait-associated genes for schizophrenia (n associations = 499; n unique genes = 275), bipolar disorder (n associations = 17; n unique genes = 13), attention deficit hyperactivity disorder (n associations = 19; n unique genes = 12) and broad depression (n associations = 41; n unique genes = 31). Importantly, both PrediXcan and summary-data-based Mendelian randomization/heterogeneity in dependent instruments analyses suggest potentially causal genes in non-brain tissues, showing the utility of these tissues for mapping psychiatric disease genetic predisposition. Our analyses further highlight the importance of joint tissue approaches as 76% of the genes were detected only in difficult-to-acquire tissues.
Drug-induced cardiovascular complications are the most common adverse drug events and account for the withdrawal or severe restrictions on the use of multitudinous postmarketed drugs. In this study, we developed new in silico models for systematic identification of drug-induced cardiovascular complications in drug discovery and postmarketing surveillance. Specifically, we collected drug-induced cardiovascular complications covering the five most common types of cardiovascular outcomes (hypertension, heart block, arrhythmia, cardiac failure, and myocardial infarction) from four publicly available data resources: Comparative Toxicogenomics Database, SIDER, Offsides, and MetaADEDB. Using these databases, we developed a combined classifier framework through integration of five machine-learning algorithms: logistic regression, random forest, k-nearest neighbors, support vector machine, and neural network. The totality of models included 180 single classifiers with area under receiver operating characteristic curves (AUC) ranging from 0.647 to 0.809 on 5-fold cross-validations. To develop the combined classifiers, we then utilized a neural network algorithm to integrate the best four single classifiers for each cardiovascular outcome. The combined classifiers had higher performance with an AUC range from 0.784 to 0.842 compared to single classifiers. Furthermore, we validated our predicted cardiovascular complications for 63 anticancer agents using experimental data from clinical studies, human pluripotent stem cell-derived cardiomyocyte assays, and literature. The success rate of our combined classifiers reached 87%. In conclusion, this study presents powerful in silico tools for systematic risk assessment of drug-induced cardiovascular complications. This tool is relevant not only in early stages of drug discovery but also throughout the life of a drug including clinical trials and postmarketing surveillance.
Computational protein design has been successful in modeling fixed backbone proteins in a single conformation. However, when modeling large ensembles of flexible proteins, current methods in protein design have been insufficient. Large barriers in the energy landscape are difficult to traverse while redesigning a protein sequence, and as a result current design methods only sample a fraction of available sequence space. We propose a new computational approach that combines traditional structure-based modeling using the Rosetta software suite with machine learning and integer linear programming to overcome limitations in the Rosetta sampling methods. We demonstrate the effectiveness of this method, which we call BROAD, by benchmarking the performance on increasing predicted breadth of anti-HIV antibodies. We use this novel method to increase predicted breadth of naturally-occurring antibody VRC23 against a panel of 180 divergent HIV viral strains and achieve 100% predicted binding against the panel. In addition, we compare the performance of this method to state-of-the-art multistate design in Rosetta and show that we can outperform the existing method significantly. We further demonstrate that sequences recovered by this method recover known binding motifs of broadly neutralizing anti-HIV antibodies. Finally, our approach is general and can be extended easily to other protein systems. Although our modeled antibodies were not tested in vitro, we predict that these variants would have greatly increased breadth compared to the wild-type antibody.
Genomic maps of local ancestry identify ancestry transitions - points on a chromosome where recent recombination events in admixed individuals have joined two different ancestral haplotypes. These events bring together alleles that evolved within separate continential populations, providing a unique opportunity to evaluate the joint effect of these alleles on health outcomes. In this work, we evaluate the impact of genetic variants in the context of nearby local ancestry transitions within a sample of nearly 10,000 adults of African ancestry with traits derived from electronic health records. Genetic data was located using the Metabochip, and used to derive local ancestry. We develop a model that captures the effect of both single variants and local ancestry, and use it to identify examples where local ancestry transitions significantly interact with nearby variants to influence metabolic traits. In our most compelling example, we find that the minor allele of rs16890640 occuring on a European background with a downstream local ancestry transition to African ancestry results in significantly lower mean corpuscular hemoglobin and volume. This finding represents a new way of discovering genetic interactions, and is supported by molecular data that suggest changes to local ancestry may impact local chromatin looping.
Amyloid beta (Aβ) peptides impair multiple cellular pathways and play a causative role in Alzheimer's disease (AD) pathology, but how the brain proteome is remodeled by this process is unknown. To identify protein networks associated with AD-like pathology, we performed global quantitative proteomic analysis in three mouse models at young and old ages. Our analysis revealed a robust increase in Apolipoprotein E (ApoE) levels in nearly all brain regions with increased Aβ levels. Taken together with prior findings on ApoE driving Aβ accumulation, this analysis points to a pathological dysregulation of the ApoE-Aβ axis. We also found dysregulation of protein networks involved in excitatory synaptic transmission. Analysis of the AMPA receptor (AMPAR) complex revealed specific loss of TARPγ-2, a key AMPAR-trafficking protein. Expression of TARPγ-2 in hAPP transgenic mice restored AMPA currents. This proteomic database represents a resource for the identification of protein alterations responsible for AD.
Copyright © 2017 The Author(s). Published by Elsevier Inc. All rights reserved.
Summary - Biological models contain many parameters whose values are difficult to measure directly via experimentation and therefore require calibration against experimental data. Markov chain Monte Carlo (MCMC) methods are suitable to estimate multivariate posterior model parameter distributions, but these methods may exhibit slow or premature convergence in high-dimensional search spaces. Here, we present PyDREAM, a Python implementation of the (Multiple-Try) Differential Evolution Adaptive Metropolis [DREAM(ZS)] algorithm developed by Vrugt and ter Braak (2008) and Laloy and Vrugt (2012). PyDREAM achieves excellent performance for complex, parameter-rich models and takes full advantage of distributed computing resources, facilitating parameter inference and uncertainty estimation of CPU-intensive biological models.
Availability and implementation - PyDREAM is freely available under the GNU GPLv3 license from the Lopez lab GitHub repository at http://github.com/LoLab-VU/PyDREAM.
Contact - firstname.lastname@example.org.
Supplementary information - Supplementary data are available at Bioinformatics online.
© The Author(s) 2017. Published by Oxford University Press.
Objective - Understanding how to identify the social determinants of health from electronic health records (EHRs) could provide important insights to understand health or disease outcomes. We developed a methodology to capture 2 rare and severe social determinants of health, homelessness and adverse childhood experiences (ACEs), from a large EHR repository.
Materials and Methods - We first constructed lexicons to capture homelessness and ACE phenotypic profiles. We employed word2vec and lexical associations to mine homelessness-related words. Next, using relevance feedback, we refined the 2 profiles with iterative searches over 100 million notes from the Vanderbilt EHR. Seven assessors manually reviewed the top-ranked results of 2544 patient visits relevant for homelessness and 1000 patients relevant for ACE.
Results - word2vec yielded better performance (area under the precision-recall curve [AUPRC] of 0.94) than lexical associations (AUPRC = 0.83) for extracting homelessness-related words. A comparative study of searches for the 2 phenotypes revealed a higher performance achieved for homelessness (AUPRC = 0.95) than ACE (AUPRC = 0.79). A temporal analysis of the homeless population showed that the majority experienced chronic homelessness. Most ACE patients suffered sexual (70%) and/or physical (50.6%) abuse, with the top-ranked abuser keywords being "father" (21.8%) and "mother" (15.4%). Top prevalent associated conditions for homeless patients were lack of housing (62.8%) and tobacco use disorder (61.5%), while for ACE patients it was mental disorders (36.6%-47.6%).
Conclusion - We provide an efficient solution for mining homelessness and ACE information from EHRs, which can facilitate large clinical and genetic studies of these social determinants of health.
© The Author 2017. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: email@example.com
We hypothesize that the relative mitochondria copy number (MTCN) can be estimated by comparing the abundance of mitochondrial DNA to nuclear DNA reads using high throughput sequencing data. To test this hypothesis, we examined relative MTCN across 13 breast cancer cell lines using the RT-PCR based NovaQUANT Human Mitochondrial to Nuclear DNA Ratio Kit as the gold standard. Six distinct computational approaches were used to estimate the relative MTCN in order to compare to the RT-PCR measurements. The results demonstrate that relative MTCN correlates well with the RT-PCR measurements using exome sequencing data, but not RNA-seq data. Through analysis of copy number variants (CNVs) in The Cancer Genome Atlas, we show that the two nuclear genes used in the NovaQUANT assay to represent the nuclear genome often experience CNVs in tumor cells, questioning the accuracy of this gold-standard method when it is applied to tumor cells.
Copyright © 2017 Elsevier Inc. All rights reserved.
OBJECTIVE - To compare three groupings of Electronic Health Record (EHR) billing codes for their ability to represent clinically meaningful phenotypes and to replicate known genetic associations. The three tested coding systems were the International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM) codes, the Agency for Healthcare Research and Quality Clinical Classification Software for ICD-9-CM (CCS), and manually curated "phecodes" designed to facilitate phenome-wide association studies (PheWAS) in EHRs.
METHODS AND MATERIALS - We selected 100 disease phenotypes and compared the ability of each coding system to accurately represent them without performing additional groupings. The 100 phenotypes included 25 randomly-chosen clinical phenotypes pursued in prior genome-wide association studies (GWAS) and another 75 common disease phenotypes mentioned across free-text problem lists from 189,289 individuals. We then evaluated the performance of each coding system to replicate known associations for 440 SNP-phenotype pairs.
RESULTS - Out of the 100 tested clinical phenotypes, phecodes exactly matched 83, compared to 53 for ICD-9-CM and 32 for CCS. ICD-9-CM codes were typically too detailed (requiring custom groupings) while CCS codes were often not granular enough. Among 440 tested known SNP-phenotype associations, use of phecodes replicated 153 SNP-phenotype pairs compared to 143 for ICD-9-CM and 139 for CCS. Phecodes also generally produced stronger odds ratios and lower p-values for known associations than ICD-9-CM and CCS. Finally, evaluation of several SNPs via PheWAS identified novel potential signals, some seen in only using the phecode approach. Among them, rs7318369 in PEPD was associated with gastrointestinal hemorrhage.
CONCLUSION - Our results suggest that the phecode groupings better align with clinical diseases mentioned in clinical practice or for genomic studies. ICD-9-CM, CCS, and phecode groupings all worked for PheWAS-type studies, though the phecode groupings produced superior results.