, a bio/informatics shared resource is still "open for business" - Visit the CDS website
The publication data currently available has been vetted by Vanderbilt faculty, staff, administrators and trainees. The data itself is retrieved directly from NCBI's PubMed and is automatically updated on a weekly basis to ensure accuracy and completeness.
If you have any questions or comments, please contact us.
Next-generation sequencing has transformed the ability to link genotypes to phenotypes and facilitates the dissection of genetic contribution to complex traits. However, it is challenging to link genetic variants with the perturbed functional effects on proteins encoded by such genes. Here we show how RNA sequencing can be exploited to construct genotype-specific protein sequence databases to assess natural variation in proteins, providing information about the molecular toolbox driving cellular processes. For this study, we used two natural genotypes selected from a recent genome-wide association study of Populus trichocarpa, an obligate outcrosser with tremendous phenotypic variation across the natural population. This strategy allowed us to comprehensively catalogue proteins containing single amino acid polymorphisms (SAAPs), as well as insertions and deletions. We profiled the frequency of 128 types of naturally occurring amino acid substitutions, including both expected (neutral) and unexpected (non-neutral) SAAPs, with a subset occurring in regions of the genome having strong polymorphism patterns consistent with recent positive and/or divergent selection. By zeroing in on the molecular signatures of these important regions that might have previously been uncharacterized, we now provide a high-resolution molecular inventory that should improve accessibility and subsequent identification of natural protein variants in future genotype-to-phenotype studies.
Computational protein design has found great success in engineering proteins for thermodynamic stability, binding specificity, or enzymatic activity in a 'single state' design (SSD) paradigm. Multi-specificity design (MSD), on the other hand, involves considering the stability of multiple protein states simultaneously. We have developed a novel MSD algorithm, which we refer to as REstrained CONvergence in multi-specificity design (RECON). The algorithm allows each state to adopt its own sequence throughout the design process rather than enforcing a single sequence on all states. Convergence to a single sequence is encouraged through an incrementally increasing convergence restraint for corresponding positions. Compared to MSD algorithms that enforce (constrain) an identical sequence on all states the energy landscape is simplified, which accelerates the search drastically. As a result, RECON can readily be used in simulations with a flexible protein backbone. We have benchmarked RECON on two design tasks. First, we designed antibodies derived from a common germline gene against their diverse targets to assess recovery of the germline, polyspecific sequence. Second, we design "promiscuous", polyspecific proteins against all binding partners and measure recovery of the native sequence. We show that RECON is able to efficiently recover native-like, biologically relevant sequences in this diverse set of protein complexes.
During CASP10 in summer 2012, we tested BCL::Fold for prediction of free modeling (FM) and template-based modeling (TBM) targets. BCL::Fold assembles the tertiary structure of a protein from predicted secondary structure elements (SSEs) omitting more flexible loop regions early on. This approach enables the sampling of conformational space for larger proteins with more complex topologies. In preparation of CASP11, we analyzed the quality of CASP10 models throughout the prediction pipeline to understand BCL::Fold's ability to sample the native topology, identify native-like models by scoring and/or clustering approaches, and our ability to add loop regions and side chains to initial SSE-only models. The standout observation is that BCL::Fold sampled topologies with a GDT_TS score > 33% for 12 of 18 and with a topology score > 0.8 for 11 of 18 test cases de novo. Despite the sampling success of BCL::Fold, significant challenges still exist in clustering and loop generation stages of the pipeline. The clustering approach employed for model selection often failed to identify the most native-like assembly of SSEs for further refinement and submission. It was also observed that for some β-strand proteins model refinement failed as β-strands were not properly aligned to form hydrogen bonds removing otherwise accurate models from the pool. Further, BCL::Fold samples frequently non-natural topologies that require loop regions to pass through the center of the protein.
© 2015 Wiley Periodicals, Inc.
MOTIVATION - DNA and protein patterns are usefully represented by sequence logos. However, the methods for logo generation in common use lack a proper statistical basis, and are non-optimal for recognizing functionally relevant alignment columns.
RESULTS - We redefine the information at a logo position as a per-observation multiple alignment log-odds score. Such scores are positive or negative, depending on whether a column's observations are better explained as arising from relatedness or chance. Within this framework, we propose distinct normalized maximum likelihood and Bayesian measures of column information. We illustrate these measures on High Mobility Group B (HMGB) box proteins and a dataset of enzyme alignments. Particularly in the context of protein alignments, our measures improve the discrimination of biologically relevant positions.
AVAILABILITY AND IMPLEMENTATION - Our new measures are implemented in an open-source Web-based logo generation program, which is available at http://www.ncbi.nlm.nih.gov/CBBresearch/Yu/logoddslogo/index.html. A stand-alone version of the program is also available from this site.
CONTACT - email@example.com
SUPPLEMENTARY INFORMATION - Supplementary data are available at Bioinformatics online.
Published by Oxford University Press 2014. This work is written by US Government employees and is in the public domain in the US.
The allophanate hydrolase from Pseudomonas sp. strain ADP was expressed and purified, and a tryptic digest fragment was subsequently identified, expressed and purified. This 50 kDa construct retained amidase activity and was crystallized. The crystals diffracted to 2.5 Å resolution and adopted space group P21, with unit-cell parameters a = 82.4, b = 179.2, c = 112.6 Å, β = 106.6°.
OBJECTIVES - The aim of this study was to test the hypothesis that rare variants are associated with drug-induced long QT interval syndrome (diLQTS) and torsades de pointes.
BACKGROUND - diLQTS is associated with the potentially fatal arrhythmia torsades de pointes. The contribution of rare genetic variants to the underlying genetic framework predisposing to diLQTS has not been systematically examined.
METHODS - We performed whole-exome sequencing on 65 diLQTS patients and 148 drug-exposed control subjects of European descent. We used rare variant analyses (variable threshold and sequence kernel association test) and gene-set analyses to identify genes enriched with rare amino acid coding (AAC) variants associated with diLQTS. Significant associations were reanalyzed by comparing diLQTS patients with 515 ethnically matched control subjects from the National Heart, Lung, and Blood Grand Opportunity Exome Sequencing Project.
RESULTS - Rare variants in 7 genes were enriched in the diLQTS patients according to the sequence kernel association test or variable threshold compared with drug-exposed controls (p < 0.001). Of these, we replicated the diLQTS associations for KCNE1 and ACN9 using 515 Exome Sequencing Project control subjects (p < 0.05). A total of 37% of the diLQTS patients also had 1 or more rare AAC variants compared with 21% of control subjects (p = 0.009), in a pre-defined set of 7 congenital long QT interval syndrome (cLQTS) genes encoding potassium channels or channel modulators (KCNE1, KCNE2, KCNH2, KCNJ2, KCNJ5, KCNQ1, AKAP9).
CONCLUSIONS - By combining whole-exome sequencing with aggregated rare variant analyses, we implicate rare variants in KCNE1 and ACN9 as risk factors for diLQTS. Moreover, diLQTS patients were more burdened by rare AAC variants in cLQTS genes encoding potassium channel modulators, supporting the idea that multiple rare variants, notably across cLQTS genes, predispose to diLQTS.
Copyright © 2014 American College of Cardiology Foundation. Published by Elsevier Inc. All rights reserved.
A growing body of genomic data on human cancers poses the critical question of how genomic variations translate to cancer phenotypes. We used standardized shotgun proteomics and targeted protein quantitation platforms to analyze a panel of 10 colon cancer cell lines differing by mutations in DNA mismatch repair (MMR) genes. In addition, we performed transcriptome sequencing (RNA-seq) to enable detection of protein sequence variants from the proteomic data. Biologic replicate cultures yielded highly consistent proteomic inventories with a cumulative total of 6,513 protein groups with a protein false discovery rate of 3.17% across all cell lines. Networks of coexpressed proteins with differential expression based on MMR status revealed impact on protein folding, turnover and transport, on cellular metabolism and on DNA and RNA synthesis and repair. Analysis of variant amino acid sequences suggested higher stability of proteins affected by naturally occurring germline polymorphisms than of proteins affected by somatic protein sequence changes. The data provide evidence for multisystem adaptation to MMR deficiency with a stress response that targets misfolded proteins for degradation through the ubiquitin-dependent proteasome pathway. Enrichment analysis suggested epithelial-to-mesenchymal transition in RKO cells, as evidenced by increased mobility and invasion properties compared with SW480. The observed proteomic profiles demonstrate previously unknown consequences of altered DNA repair and provide an expanded basis for mechanistic interpretation of MMR phenotypes.
The proteome informatics research group of the Association of Biomolecular Resource Facilities conducted a study to assess the community's ability to detect and characterize peptides bearing a range of biologically occurring post-translational modifications when present in a complex peptide background. A data set derived from a mixture of synthetic peptides with biologically occurring modifications combined with a yeast whole cell lysate as background was distributed to a large group of researchers and their results were collectively analyzed. The results from the twenty-four participants, who represented a broad spectrum of experience levels with this type of data analysis, produced several important observations. First, there is significantly more variability in the ability to assess whether a results is significant than there is to determine the correct answer. Second, labile post-translational modifications, particularly tyrosine sulfation, present a challenge for most researchers. Finally, for modification site localization there are many tools being employed, but researchers are currently unsure of the reliability of the results these programs are producing.
Membrane protein structure determination remains a challenging endeavor. Computational methods that predict membrane protein structure from sequence can potentially aid structure determination for such difficult target proteins. The de novo protein structure prediction method BCL::Fold rapidly assembles secondary structure elements into three-dimensional models. Here, we describe modifications to the algorithm, named BCL::MP-Fold, in order to simulate membrane protein folding. Models are built into a static membrane object and are evaluated using a knowledge-based energy potential, which has been modified to account for the membrane environment. Additionally, a symmetry folding mode allows for the prediction of obligate homomultimers, a common property among membrane proteins. In a benchmark test of 40 proteins of known structure, the method sampled the correct topology in 34 cases. This demonstrates that the algorithm can accurately predict protein topology without the need for large multiple sequence alignments, homologous template structures, or experimental restraints.
Copyright © 2013 Elsevier Ltd. All rights reserved.
It is generally assumed that the MHC class I antigen (Ag)-processing (CAP) machinery - which supplies peptides for presentation by class I molecules - plays no role in class II-restricted presentation of cytoplasmic Ags. In striking contrast to this assumption, we previously reported that proteasome inhibition, TAP deficiency or ERAAP deficiency led to dramatically altered T helper (Th)-cell responses to allograft (HY) and microbial (Listeria monocytogenes) Ags. Herein, we tested whether altered Ag processing and presentation, altered CD4(+) T-cell repertoire, or both underlay the above finding. We found that TAP deficiency and ERAAP deficiency dramatically altered the quality of class II-associated self peptides suggesting that the CAP machinery impacts class II-restricted Ag processing and presentation. Consistent with altered self peptidomes, the CD4(+) T-cell receptor repertoire of mice deficient in the CAP machinery substantially differed from that of WT animals resulting in altered CD4(+) T-cell Ag recognition patterns. These data suggest that TAP and ERAAP sculpt the class II-restricted peptidome, impacting the CD4(+) T-cell repertoire, and ultimately altering Th-cell responses. Together with our previous findings, these data suggest multiple CAP machinery components sequester or degrade MHC class II-restricted epitopes that would otherwise be capable of eliciting functional Th-cell responses.
© 2013 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.