The publication data currently available has been vetted by Vanderbilt faculty, staff, administrators and trainees. The data itself is retrieved directly from NCBI's PubMed and is automatically updated on a weekly basis to ensure accuracy and completeness.
If you have any questions or comments, please contact us.
MOTIVATION - The development of cost-effective next-generation sequencing methods has spurred the development of high-throughput bioinformatics tools for detection of sequence variation. With many disparate variant-calling algorithms available, investigators must ask, 'Which method is best for my data?' Machine learning research has shown that so-called ensemble methods that combine the output of multiple models can dramatically improve classifier performance. Here we describe a novel variant-calling approach based on an ensemble of variant-calling algorithms, which we term the Consensus Genotyper for Exome Sequencing (CGES). CGES uses a two-stage voting scheme among four algorithm implementations. While our ensemble method can accept variants generated by any variant-calling algorithm, we used GATK2.8, SAMtools, FreeBayes and Atlas-SNP2 in building CGES because of their performance, widespread adoption and diverse but complementary algorithms.
RESULTS - We apply CGES to 132 samples sequenced at the Hudson Alpha Institute for Biotechnology (HAIB, Huntsville, AL) using the Nimblegen Exome Capture and Illumina sequencing technology. Our sample set consisted of 40 complete trios, two families of four, one parent-child duo and two unrelated individuals. CGES yielded the fewest total variant calls (N(CGES) = 139° 897), the highest Ts/Tv ratio (3.02), the lowest Mendelian error rate across all genotypes (0.028%), the highest rediscovery rate from the Exome Variant Server (EVS; 89.3%) and 1000 Genomes (1KG; 84.1%) and the highest positive predictive value (PPV; 96.1%) for a random sample of previously validated de novo variants. We describe these and other quality control (QC) metrics from consensus data and explain how the CGES pipeline can be used to generate call sets of varying quality stringency, including consensus calls present across all four algorithms, calls that are consistent across any three out of four algorithms, calls that are consistent across any two out of four algorithms or a more liberal set of all calls made by any algorithm.
AVAILABILITY AND IMPLEMENTATION - To enable accessible, efficient and reproducible analysis, we implement CGES both as a stand-alone command line tool available for download in GitHub and as a set of Galaxy tools and workflows configured to execute on parallel computers.
SUPPLEMENTARY INFORMATION - Supplementary data are available at Bioinformatics online.
© The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: firstname.lastname@example.org.
Androgen receptor (AR) action throughout prostate development and in maintenance of the prostatic epithelium is partly controlled by interactions between AR and forkhead box (FOX) transcription factors, particularly FOXA1. We sought to identity additional FOXA1 binding partners that may mediate prostate-specific gene expression. Here we identify the nuclear factor I (NFI) family of transcription factors as novel FOXA1 binding proteins. All four family members (NFIA, NFIB, NFIC, and NFIX) can interact with FOXA1, and knockdown studies in androgen-dependent LNCaP cells determined that modulating expression of NFI family members results in changes in AR target gene expression. This effect is probably mediated by binding of NFI family members to AR target gene promoters, because chromatin immunoprecipitation (ChIP) studies found that NFIB bound to the prostate-specific antigen enhancer. Förster resonance energy transfer studies revealed that FOXA1 is capable of bringing AR and NFIX into proximity, indicating that FOXA1 facilitates the AR and NFI interaction by bridging the complex. To determine the extent to which NFI family members regulate AR/FOXA1 target genes, motif analysis of publicly available data for ChIP followed by sequencing was undertaken. This analysis revealed that 34.4% of peaks bound by AR and FOXA1 contain NFI binding sites. Validation of 8 of these peaks by ChIP revealed that NFI family members can bind 6 of these predicted genomic elements, and 4 of the 8 associated genes undergo gene expression changes as a result of individual NFI knockdown. These observations suggest that NFI regulation of FOXA1/AR action is a frequent event, with individual family members playing distinct roles in AR target gene expression.
A major challenge in interpreting the large volume of mutation data identified by next-generation sequencing (NGS) is to distinguish driver mutations from neutral passenger mutations to facilitate the identification of targetable genes and new drugs. Current approaches are primarily based on mutation frequencies of single-genes, which lack the power to detect infrequently mutated driver genes and ignore functional interconnection and regulation among cancer genes. We propose a novel mutation network method, VarWalker, to prioritize driver genes in large scale cancer mutation data. VarWalker fits generalized additive models for each sample based on sample-specific mutation profiles and builds on the joint frequency of both mutation genes and their close interactors. These interactors are selected and optimized using the Random Walk with Restart algorithm in a protein-protein interaction network. We applied the method in >300 tumor genomes in two large-scale NGS benchmark datasets: 183 lung adenocarcinoma samples and 121 melanoma samples. In each cancer, we derived a consensus mutation subnetwork containing significantly enriched consensus cancer genes and cancer-related functional pathways. These cancer-specific mutation networks were then validated using independent datasets for each cancer. Importantly, VarWalker prioritizes well-known, infrequently mutated genes, which are shown to interact with highly recurrently mutated genes yet have been ignored by conventional single-gene-based approaches. Utilizing VarWalker, we demonstrated that network-assisted approaches can be effectively adapted to facilitate the detection of cancer driver genes in NGS data.
Effective CD8(+) T cell responses depend on presentation of a stable peptide repertoire by MHC class I (MHC I) molecules on the cell surface. The overall quality of peptide-MHC I complexes (pMHC I) is determined by poorly understood mechanisms that generate and load peptides with appropriate consensus motifs onto MHC I. In this article, we show that both tapasin (Tpn), a key component of the peptide loading complex, and the endoplasmic reticulum aminopeptidase associated with Ag processing (ERAAP) are quintessential editors of distinct structural features of the peptide repertoire. We carried out reciprocal immunization of wild-type mice with cells from Tpn- or ERAAP-deficient mice. Specificity analysis of T cell responses showed that absence of Tpn or ERAAP independently altered the peptide repertoire by causing loss as well as gain of new pMHC I. Changes in amino acid sequences of MHC-bound peptides revealed that ERAAP and Tpn, respectively, defined the characteristic amino and carboxy termini of canonical MHC I peptides. Thus, the optimal pMHC I repertoire is produced by two distinct peptide editing steps in the endoplasmic reticulum.
The lineage-specific basic helix-loop-helix transcription factor Ptf1a is a critical driver for development of both the pancreas and nervous system. How one transcription factor controls diverse programs of gene expression is a fundamental question in developmental biology. To uncover molecular strategies for the program-specific functions of Ptf1a, we identified bound genomic regions in vivo during development of both tissues. Most regions bound by Ptf1a are specific to each tissue, lie near genes needed for proper formation of each tissue, and coincide with regions of open chromatin. The specificity of Ptf1a binding is encoded in the DNA surrounding the Ptf1a-bound sites, because these regions are sufficient to direct tissue-restricted reporter expression in transgenic mice. Fox and Sox factors were identified as potential lineage-specific modifiers of Ptf1a binding, since binding motifs for these factors are enriched in Ptf1a-bound regions in pancreas and neural tube, respectively. Of the Fox factors expressed during pancreatic development, Foxa2 plays a major role. Indeed, Ptf1a and Foxa2 colocalize in embryonic pancreatic chromatin and can act synergistically in cell transfection assays. Together, these findings indicate that lineage-specific chromatin landscapes likely constrain the DNA binding of Ptf1a, and they identify Fox and Sox gene families as part of this process.
Research during the past decade has seen significant progress in the understanding of the genetic architecture of autism spectrum disorders (ASDs), with gene discovery accelerating as the characterization of genomic variation has become increasingly comprehensive. At the same time, this research has highlighted ongoing challenges. Here we address the enormous impact of high-throughput sequencing (HTS) on ASD gene discovery, outline a consensus view for leveraging this technology, and describe a large multisite collaboration developed to accomplish these goals. Similar approaches could prove effective for severe neurodevelopmental disorders more broadly.
Copyright © 2012 Elsevier Inc. All rights reserved.
Cooperativity between oncogenic mutations is recognized as a fundamental feature of malignant transformation, and it may be mediated by synergistic regulation of the expression of pro- and antitumorigenic target genes. However, the mechanisms by which oncogenes and tumor suppressors coregulate downstream targets and pathways remain largely unknown. Here, we used ChIP coupled to massively parallel sequencing (ChIP-seq) and gene expression profiling in mouse prostates to identify direct targets of the tumor suppressor Nkx3.1. Further analysis indicated that a substantial fraction of Nkx3.1 target genes are also direct targets of the oncoprotein Myc. We also showed that Nkx3.1 and Myc bound to and crossregulated shared target genes in mouse and human prostate epithelial cells and that Nkx3.1 could oppose the transcriptional activity of Myc. Furthermore, loss of Nkx3.1 cooperated with concurrent overexpression of Myc to promote prostate cancer in transgenic mice. In human prostate cancer patients, dysregulation of shared NKX3.1/MYC target genes was associated with disease relapse. Our results indicate that NKX3.1 and MYC coregulate prostate tumorigenesis by converging on, and crossregulating, a common set of target genes. We propose that coregulation of target gene expression by oncogenic/tumor suppressor transcription factors may represent a general mechanism underlying the cooperativity of oncogenic mutations during tumorigenesis.
Extensive networks of tertiary interactions give rise to unique, highly organized domain architectures that characterize the three-dimensional structure of large RNA molecules. Formed by stacked layers of a near-planar arrangement of contiguous coaxial helices, large RNA molecules are relatively flat in overall shape. The functional core of these molecules is stabilized by a diverse set of tertiary interaction motifs that often bring together distant regions of conserved nucleotides. Although homologous RNAs from different organisms can be structurally diverse, they adopt a structurally conserved functional core that includes preassembled active and/or substrate binding sites. These findings broaden our understanding of RNA folding and tertiary structure stabilization, illustrating how large, complex RNAs assemble into unique structures to perform recognition and catalysis.
Copyright © 2011 Elsevier Ltd. All rights reserved.
Proapoptotic BH3 interacting domain death agonist (Bid), a BH3-only Bcl-2 family member, is situated at the interface between the DNA damage response and apoptosis, with roles in death receptor-induced apoptosis as well as cell cycle checkpoints following DNA damage.(1, 2, 3) In this study, we demonstrate that Bid functions at the level of the sensor complex in the Atm and Rad3-related (Atr)-directed DNA damage response. Bid is found with replication protein A (RPA) in nuclear foci and associates with the Atr/Atr-interacting protein (Atrip)/RPA complex following replicative stress. Furthermore, Bid-deficient cells show an impaired response to replicative stress manifest by reduced accumulation of Atr and Atrip on chromatin and at DNA damage foci, reduced recovery of DNA synthesis following replicative stress, and decreased checkpoint kinase 1 activation and RPA phosphorylation. These results establish a direct role for the BH3-only Bcl-2 family member, Bid, acting at the level of the damage sensor complex to amplify the Atr-directed cellular response to replicative DNA damage.
p53 and p63 belong to a family of sequence-specific transcription factors regulating key cellular processes. Differential composition of the p53 and p63 DNA-binding sites may contribute to distinct functions of these protein homologues. We used SELEX (systematic evolution of ligands by exponential enrichment) methodology to identify nucleic acid ligands for p63. We found that p63 bound preferentially to DNA fragments conforming to the 20 bp sequence 5'-RRRC(A/G)(A/T)GYYYRRRC(A/T)(C/T)GYYY-3'. Relative to the p53 consensus, the p63 consensus DNA-binding site (DBS) was more degenerate, particularly at positions 10 and 11, and was enriched for A/G at position 5 and C/T at position 16 of the consensus. The differences in DNA-binding site preferences between p63 and p53 influenced their ability to activate transcription from select response elements (REs) in cells. A computer algorithm, p63MH, was developed to find candidate p63-binding motifs on input sequences. We identified genes responsive to p63 regulation that contain functional p63 REs. Our results suggest that the sequence composition of REs could be one contributing factor to target gene discrimination between p63 and p53.