The publication data currently available has been vetted by Vanderbilt faculty, staff, administrators and trainees. The data itself is retrieved directly from NCBI's PubMed and is automatically updated on a weekly basis to ensure accuracy and completeness.
If you have any questions or comments, please contact us.
Single-cell RNA sequencing (scRNA-seq) has become a powerful tool for the systematic investigation of cellular diversity. As a number of computational tools have been developed to identify and visualize cell populations within a single scRNA-seq dataset, there is a need for methods to quantitatively and statistically define proportional shifts in cell population structures across datasets, such as expansion or shrinkage or emergence or disappearance of cell populations. Here we present sc-UniFrac, a framework to statistically quantify compositional diversity in cell populations between single-cell transcriptome landscapes. sc-UniFrac enables sensitive and robust quantification in simulated and experimental datasets in terms of both population identity and quantity. We have demonstrated the utility of sc-UniFrac in multiple applications, including assessment of biological and technical replicates, classification of tissue phenotypes and regional specification, identification and definition of altered cell infiltrates in tumorigenesis, and benchmarking batch-correction tools. sc-UniFrac provides a framework for quantifying diversity or alterations in cell populations across conditions and has broad utility for gaining insight into tissue-level perturbations at the single-cell resolution.
BACKGROUND - High throughput sequencing technology enables the both the human genome and transcriptome to be screened at the single nucleotide resolution. Tools have been developed to infer single nucleotide variants (SNVs) from both DNA and RNA sequencing data. To evaluate how much difference can be expected between DNA and RNA sequencing data, and among tissue sources, we designed a study to examine the single nucleotide difference among five sources of high throughput sequencing data generated from the same individual, including exome sequencing from blood, tumor and adjacent normal tissue, and RNAseq from tumor and adjacent normal tissue.
RESULTS - Through careful quality control and analysis of the SNVs, we found little difference between DNA-DNA pairs (1%-2%). However, between DNA-RNA pairs, SNV differences ranged anywhere from 10% to 20%.
CONCLUSIONS - Only a small portion of these differences can be explained by RNA editing. Instead, the majority of the DNA-RNA differences should be attributed to technical errors from sequencing and post-processing of RNAseq data. Our analysis results suggest that SNV detection using RNAseq is subject to high false positive rates.
Heparin-induced thrombocytopenia (HIT) is an unpredictable, life-threatening, immune-mediated reaction to heparin. Variation in human leukocyte antigen (HLA) genes is now used to prevent immune-mediated adverse drug reactions. Combinations of HLA alleles and killer cell immunoglobulin-like receptors (KIR) are associated with multiple autoimmune diseases and infections. The objective of this study is to evaluate the association of HLA alleles and KIR types, alone or in the presence of different HLA ligands, with HIT. HIT cases and heparin-exposed controls were identified in BioVU, an electronic health record coupled to a DNA biobank. HLA sequencing and KIR type imputation using Illumina OMNI-Quad data were performed. Odds ratios for HLA alleles and KIR types and HLA*KIR interactions using conditional logistic regressions were determined in the overall population and by race/ethnicity. Analysis was restricted to KIR types and HLA alleles with a frequency greater than 0.01. The p values for HLA and KIR association were corrected by using a false discovery rate q<0.05 and HLA*KIR interactions were considered significant at p<0.05. Sixty-five HIT cases and 350 matched controls were identified. No statistical differences in baseline characteristics were observed between cases and controls. The HLA-DRB3*01:01 allele was significantly associated with HIT in the overall population (odds ratio 2.81 [1.57-5.02], p=2.1×10 , q=0.02) and in individuals with European ancestry, independent of other alleles. No KIR types were associated with HIT, although a significant interaction was observed between KIR2DS5 and the HLA-C1 KIR binding group (p=0.03). The HLA-DRB3*01:01 allele was identified as a potential risk factor for HIT. This class II HLA gene and allele represent biologically plausible candidates for influencing HIT pathogenesis. We found limited evidence of the role of KIR types in HIT pathogenesis. Replication and further study of the HLA-DRB3*01:01 association is necessary.
© 2017 Pharmacotherapy Publications, Inc.
One hundred healthy infants enrolled as controls in a tuberculosis vaccine study in Nyanza Province, Kenya provided anonymized samples for DNA sequence-based typing at the HLA-A, -B, -C, -DPB1, -DQA1, -DQB1, -DRB1, and -DRB3/4/5 loci. The purpose of the study was to characterize allele frequencies in the local population, to support studies of T cell immunity against pathogens, including Mycobacterium tuberculosis. There are no detectable deviations from Hardy Weinberg proportions for the HLA-B, -C, -DRB1, -DPB1, -DQA1 and -DQB1 loci. A minor deviation was detected at the HLA-A locus due to an excess of HLA-A*02:02, 29:02, 30:02, and 68:02 homozygotes. The genotype data are available in the Allele Frequencies Net Database under identifier 3393.
Copyright © 2017. Published by Elsevier Inc.
To assess the effect of chemotherapy on mitochondrial genome mutations in cancer survivors and their offspring, a study sequenced the full mitochondrial genome and determined the mitochondrial DNA heteroplasmic (mtDNA) mutation rate. To build a model for counts of heteroplasmic mutations in mothers and their offspring, bivariate Poisson regression was used to examine the relationship between mutation count and clinical information while accounting for the paired correlation. However, if the sequencing depth is not adequate, a limited fraction of the mtDNA will be available for variant calling. The classical bivariate Poisson regression model treats the offset term as equal within pairs; thus, it cannot be applied directly. In this research, we propose an extended bivariate Poisson regression model that has a more general offset term to adjust the length of the accessible genome for each observation. We evaluate the performance of the proposed method with comprehensive simulations, and the results show that the regression model provides unbiased parameter estimations. The use of the model is also demonstrated using the paired mtDNA dataset.
Due to their functional independence, proteins that comprise standalone metabolic units, which we name single-protein metabolic modules, may be particularly prone to gene duplication (GD) and horizontal gene transfer (HGT). Flavohemoglobins (flavoHbs) are prime examples of single-protein metabolic modules, detoxifying nitric oxide (NO), a ubiquitous toxin whose antimicrobial properties many life forms exploit, to nitrate, a common source of nitrogen for organisms. FlavoHbs appear widespread in bacteria and have been identified in a handful of microbial eukaryotes, but how the distribution of this ecologically and biomedically important protein family evolved remains unknown. Reconstruction of the evolutionary history of 3,318 flavoHb protein sequences covering the family's known diversity showed evidence of recurrent HGT at multiple evolutionary scales including intrabacterial HGT, as well as HGT from bacteria to eukaryotes. One of the most striking examples of HGT is the acquisition of a flavoHb by the dandruff- and eczema-causing fungus Malassezia from Corynebacterium Actinobacteria, a transfer that growth experiments show is capable of mediating NO resistance in fungi. Other flavoHbs arose via GD; for example, many filamentous fungi possess two flavoHbs that are differentially targeted to the cytosol and mitochondria, likely conferring protection against external and internal sources of NO, respectively. Because single-protein metabolic modules such as flavoHb function independently, readily undergo GD and HGT, and are frequently involved in organismal defense and competition, we suggest that they represent "plug-and-play" proteins for ecological arms races.
© The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: email@example.com.
Functional annotation of genetic variants including single nucleotide polymorphisms (SNPs) and copy number variations (CNV) promises to greatly improve our understanding of human complex traits. Previous transcriptomic studies involving individuals from different global populations have investigated the genetic architecture of gene expression variation by mapping expression quantitative trait loci (eQTL). Functional interpretation of genome-wide association studies (GWAS) has identified enrichment of eQTL in top signals from GWAS of human complex traits. The SCAN (SNP and CNV Annotation) database was developed as a web-based resource of genetical genomic studies including eQTL detected in the HapMap lymphoblastoid cell line samples derived from apparently healthy individuals of European and African ancestry. Considering the critical roles of epigenetic gene regulation, cytosine modification quantitative trait loci (mQTL) are expected to add a crucial layer of annotation to existing functional genomic information. Here, we describe the new features of the SCAN database that integrate comprehensive mQTL mapping results generated in the HapMap CEU (Caucasian residents from Utah, USA) and YRI (Yoruba people from Ibadan, Nigeria) LCL samples and demonstrate the utility of the enhanced functional annotation system.
© The Author(s) 2015. Published by Oxford University Press.
We previously identified a low-frequency (1.1 %) coding variant (G45R; rs200573126) in the adiponectin gene (ADIPOQ) which was the basis for a multipoint microsatellite linkage signal (LOD = 8.2) for plasma adiponectin levels in Hispanic families. We have empirically evaluated the ability of data from targeted common variants, exome chip genotyping, and genome-wide association study data to detect linkage and association to adiponectin protein levels at this locus. Simple two-point linkage and association analyses were performed in 88 Hispanic families (1,150 individuals) using 10,958 SNPs on chromosome 3. Approaches were compared for their ability to map the functional variant, G45R, which was strongly linked (two-point LOD = 20.98) and powerfully associated (p value = 8.1 × 10(-50)). Over 450 SNPs within a broad 61 Mb interval around rs200573126 showed nominal evidence of linkage (LOD > 3) but only four other SNPs in this region were associated with p values < 1.0 × 10(-4). When G45R was accounted for, the maximum LOD score across the interval dropped to 4.39 and the best p value was 1.1 × 10(-5). Linked and/or associated variants ranged in frequency (0.0018-0.50) and type (coding, non-coding) and had little detectable linkage disequilibrium with rs200573126 (r (2) < 0.20). In addition, the two-point linkage approach empirically outperformed multipoint microsatellite and multipoint SNP analysis. In the absence of data for rs200573126, family-based linkage analysis using a moderately dense SNP dataset, including both common and low-frequency variants, resulted in stronger evidence for an adiponectin locus than association data alone. Thus, linkage analysis can be a useful tool to facilitate identification of high-impact genetic variants.
PURPOSE - Cataract is the leading cause of blindness in the world, and in the United States accounts for approximately 60% of Medicare costs related to vision. The purpose of this study was to identify genetic markers for age-related cataract through a genome-wide association study (GWAS).
METHODS - In the electronic medical records and genomics (eMERGE) network, we ran an electronic phenotyping algorithm on individuals in each of five sites with electronic medical records linked to DNA biobanks. We performed a GWAS using 530,101 SNPs from the Illumina 660W-Quad in a total of 7,397 individuals (5,503 cases and 1,894 controls). We also performed an age-at-diagnosis case-only analysis.
RESULTS - We identified several statistically significant associations with age-related cataract (45 SNPs) as well as age at diagnosis (44 SNPs). The 45 SNPs associated with cataract at p<1×10(-5) are in several interesting genes, including ALDOB, MAP3K1, and MEF2C. All have potential biologic relationships with cataracts.
CONCLUSIONS - This is the first genome-wide association study of age-related cataract, and several regions of interest have been identified. The eMERGE network has pioneered the exploration of genomic associations in biobanks linked to electronic health records, and this study is another example of the utility of such resources. Explorations of age-related cataract including validation and replication of the association results identified herein are needed in future studies.
PURPOSE - To determine if plasma metabolic profiles can detect differences between patients with neovascular age-related macular degeneration (NVAMD) and similarly-aged controls.
METHODS - Metabolomic analysis using liquid chromatography with Fourier-transform mass spectrometry (LC-FTMS) was performed on plasma samples from 26 NVAMD patients and 19 controls. Data were collected from mass/charge ratio (m/z) 85 to 850 on a Thermo LTQ-FT mass spectrometer, and metabolic features were extracted using an adaptive processing software package. Both non-transformed and log2 transformed data were corrected using Benjamini and Hochberg False Discovery Rate (FDR) to account for multiple testing. Orthogonal Partial Least Squares-Discriminant Analysis was performed to determine metabolic features that distinguished NVAMD patients from controls. Individual m/z features were matched to the Kyoto Encyclopedia of Genes and Genomes database and the Metlin metabolomics database, and metabolic pathways associated with NVAMD were identified using MetScape.
RESULTS - Of the 1680 total m/z features detected by LC-FTMS, 94 unique m/z features were significantly different between NVAMD patients and controls using FDR (q = 0.05). A comparison of these features to those found with log2 transformed data (n = 132, q = 0.2) revealed 40 features in common, reaffirming the involvement of certain metabolites. Such metabolites included di- and tripeptides, covalently modified amino acids, bile acids, and vitamin D-related metabolites. Correlation analysis revealed associations among certain significant features, and pathway analysis demonstrated broader changes in tyrosine metabolism, sulfur amino acid metabolism, and amino acids related to urea metabolism.
CONCLUSIONS - These data suggest that metabolomic analysis can identify a panel of individual metabolites that differ between NVAMD cases and controls. Pathway analysis can assess the involvement of certain metabolic pathways, such as tyrosine and urea metabolism, and can provide further insight into the pathophysiology of AMD.