The publication data currently available has been vetted by Vanderbilt faculty, staff, administrators and trainees. The data itself is retrieved directly from NCBI's PubMed and is automatically updated on a weekly basis to ensure accuracy and completeness.
If you have any questions or comments, please contact us.
BACKGROUND - The integration of high-quality, genome-wide analyses offers a robust approach to elucidating genetic factors involved in complex human diseases. Even though several methods exist to integrate heterogeneous omics data, most biologists still manually select candidate genes by examining the intersection of lists of candidates stemming from analyses of different types of omics data that have been generated by imposing hard (strict) thresholds on quantitative variables, such as P-values and fold changes, increasing the chance of missing potentially important candidates.
METHODS - To better facilitate the unbiased integration of heterogeneous omics data collected from diverse platforms and samples, we propose a desirability function framework for identifying candidate genes with strong evidence across data types as targets for follow-up functional analysis. Our approach is targeted towards disease systems with sparse, heterogeneous omics data, so we tested it on one such pathology: spontaneous preterm birth (sPTB).
RESULTS - We developed the software integRATE, which uses desirability functions to rank genes both within and across studies, identifying well-supported candidate genes according to the cumulative weight of biological evidence rather than based on imposition of hard thresholds of key variables. Integrating 10 sPTB omics studies identified both genes in pathways previously suspected to be involved in sPTB as well as novel genes never before linked to this syndrome. integRATE is available as an R package on GitHub ( https://github.com/haleyeidem/integRATE ).
CONCLUSIONS - Desirability-based data integration is a solution most applicable in biological research areas where omics data is especially heterogeneous and sparse, allowing for the prioritization of candidate genes that can be used to inform more targeted downstream functional analyses.
Genomic regions with gene regulatory enhancer activity turnover rapidly across mammals. In contrast, gene expression patterns and transcription factor binding preferences are largely conserved between mammalian species. Based on this conservation, we hypothesized that enhancers active in different mammals would exhibit conserved sequence patterns in spite of their different genomic locations. To investigate this hypothesis, we evaluated the extent to which sequence patterns that are predictive of enhancers in one species are predictive of enhancers in other mammalian species by training and testing two types of machine learning models. We trained support vector machine (SVM) and convolutional neural network (CNN) classifiers to distinguish enhancers defined by histone marks from the genomic background based on DNA sequence patterns in human, macaque, mouse, dog, cow, and opossum. The classifiers accurately identified many adult liver, developing limb, and developing brain enhancers, and the CNNs outperformed the SVMs. Furthermore, classifiers trained in one species and tested in another performed nearly as well as classifiers trained and tested on the same species. We observed similar cross-species conservation when applying the models to human and mouse enhancers validated in transgenic assays. This indicates that many short sequence patterns predictive of enhancers are largely conserved. The sequence patterns most predictive of enhancers in each species matched the binding motifs for a common set of TFs enriched for expression in relevant tissues, supporting the biological relevance of the learned features. Thus, despite the rapid change of active enhancer locations between mammals, cross-species enhancer prediction is often possible. Our results suggest that short sequence patterns encoding enhancer activity have been maintained across more than 180 million years of mammalian evolution.
Bcl-2 family proteins reorganize mitochondrial membranes during apoptosis, to form pores and rearrange cristae. In vitro and in vivo analysis integrated with human genetics reveals a novel homeostatic mitochondrial function for Bcl-2 family protein Bid. Loss of full-length Bid results in apoptosis-independent, irregular cristae with decreased respiration. mice display stress-induced myocardial dysfunction and damage. A gene-based approach applied to a biobank, validated in two independent GWAS studies, reveals that decreased genetically determined BID expression associates with myocardial infarction (MI) susceptibility. Patients in the bottom 5% of the expression distribution exhibit >4 fold increased MI risk. Carrier status with nonsynonymous variation in Bid's membrane binding domain, Bid, associates with MI predisposition. Furthermore, Bid but not Bid associates with Mcl-1, previously implicated in cristae stability; decreased MCL-1 expression associates with MI. Our results identify a role for Bid in homeostatic mitochondrial cristae reorganization, that we link to human cardiac disease.
© 2018, Salisbury-Ruf et al.
Electronic health record (EHR) algorithms for defining patient cohorts are commonly shared as free-text descriptions that require human intervention both to interpret and implement. We developed the Phenotype Execution and Modeling Architecture (PhEMA, http://projectphema.org) to author and execute standardized computable phenotype algorithms. With PhEMA, we converted an algorithm for benign prostatic hyperplasia, developed for the electronic Medical Records and Genomics network (eMERGE), into a standards-based computable format. Eight sites (7 within eMERGE) received the computable algorithm, and 6 successfully executed it against local data warehouses and/or i2b2 instances. Blinded random chart review of cases selected by the computable algorithm shows PPV ≥90%, and 3 out of 5 sites had >90% overlap of selected cases when comparing the computable algorithm to their original eMERGE implementation. This case study demonstrates potential use of PhEMA computable representations to automate phenotyping across different EHR systems, but also highlights some ongoing challenges.
Proteomics, metabolomics, and transcriptomics generate comprehensive data sets, and current biocomputational capabilities allow their efficient integration for systems biology analysis. Published multiomics studies cover methodological advances as well as applications to biological questions. However, few studies have focused on the development of a high-throughput, unified sample preparation approach to complement high-throughput omic analytics. This report details the automation, benchmarking, and application of a strategy for transcriptomic, proteomic, and metabolomic analyses from a common sample. The approach, sample preparation for multi-omics technologies (SPOT), provides equivalent performance to typical individual omic preparation methods but greatly enhances throughput and minimizes the resources required for multiomic experiments. SPOT was applied to a multiomics time course experiment for zinc-treated HL-60 cells. The data reveal Zn effects on NRF2 antioxidant and NFkappaB signaling. High-throughput approaches such as these are critical for the acquisition of temporally resolved, multicondition, large multiomic data sets such as those necessary to assess complex clinical and biological concerns. Ultimately, this type of approach will provide an expanded understanding of challenging scientific questions across many fields.
The eMERGE Network is establishing methods for electronic transmittal of patient genetic test results from laboratories to healthcare providers across organizational boundaries. We surveyed the capabilities and needs of different network participants, established a common transfer format, and implemented transfer mechanisms based on this format. The interfaces we created are examples of the connectivity that must be instantiated before electronic genetic and genomic clinical decision support can be effectively built at the point of care. This work serves as a case example for both standards bodies and other organizations working to build the infrastructure required to provide better electronic clinical decision support for clinicians.
The completion of the Human Genome Project has unleashed a wealth of human genomics information, but it remains unclear how best to implement this information for the benefit of patients. The standard approach of biomedical research, with researchers pursuing advances in knowledge in the laboratory and, separately, clinicians translating research findings into the clinic as much as decades later, will need to give way to new interdisciplinary models for research in genomic medicine. These models should include scientists and clinicians actively working as teams to study patients and populations recruited in clinical settings and communities to make genomics discoveries-through the combined efforts of data scientists, clinical researchers, epidemiologists, and basic scientists-and to rapidly apply these discoveries in the clinic for the prediction, prevention, diagnosis, prognosis, and treatment of cardiovascular diseases and stroke. The highly publicized US Precision Medicine Initiative, also known as All of Us, is a large-scale program funded by the US National Institutes of Health that will energize these efforts, but several ongoing studies such as the UK Biobank Initiative; the Million Veteran Program; the Electronic Medical Records and Genomics Network; the Kaiser Permanente Research Program on Genes, Environment and Health; and the DiscovEHR collaboration are already providing exemplary models of this kind of interdisciplinary work. In this statement, we outline the opportunities and challenges in broadly implementing new interdisciplinary models in academic medical centers and community settings and bringing the promise of genomics to fruition.
© 2018 American Heart Association, Inc.
This review will provide an overview of the principles of pharmacogenomics from basic discovery to implementation, encompassing application of tools of contemporary genome science to the field (including areas of apparent divergence from disease-based genomics), a summary of lessons learned from the extensively studied drugs clopidogrel and warfarin, the current status of implementing pharmacogenetic testing in practice, the role of genomics and related tools in the drug development process, and a summary of future opportunities and challenges.
© 2018 American Heart Association, Inc.
We performed an extensive immunogenomic analysis of more than 10,000 tumors comprising 33 diverse cancer types by utilizing data compiled by TCGA. Across cancer types, we identified six immune subtypes-wound healing, IFN-γ dominant, inflammatory, lymphocyte depleted, immunologically quiet, and TGF-β dominant-characterized by differences in macrophage or lymphocyte signatures, Th1:Th2 cell ratio, extent of intratumoral heterogeneity, aneuploidy, extent of neoantigen load, overall cell proliferation, expression of immunomodulatory genes, and prognosis. Specific driver mutations correlated with lower (CTNNB1, NRAS, or IDH1) or higher (BRAF, TP53, or CASP8) leukocyte levels across all cancers. Multiple control modalities of the intracellular and extracellular networks (transcription, microRNAs, copy number, and epigenetic processes) were involved in tumor-immune cell interactions, both across and within immune subtypes. Our immunogenomics pipeline to characterize these heterogeneous tumors and the resulting data are intended to serve as a resource for future targeted studies to further advance the field.
Copyright © 2018 The Authors. Published by Elsevier Inc. All rights reserved.
For a decade, The Cancer Genome Atlas (TCGA) program collected clinicopathologic annotation data along with multi-platform molecular profiles of more than 11,000 human tumors across 33 different cancer types. TCGA clinical data contain key features representing the democratized nature of the data collection process. To ensure proper use of this large clinical dataset associated with genomic features, we developed a standardized dataset named the TCGA Pan-Cancer Clinical Data Resource (TCGA-CDR), which includes four major clinical outcome endpoints. In addition to detailing major challenges and statistical limitations encountered during the effort of integrating the acquired clinical data, we present a summary that includes endpoint usage recommendations for each cancer type. These TCGA-CDR findings appear to be consistent with cancer genomics studies independent of the TCGA effort and provide opportunities for investigating cancer biology using clinical correlates at an unprecedented scale.
Copyright © 2018 Elsevier Inc. All rights reserved.