The publication data currently available has been vetted by Vanderbilt faculty, staff, administrators and trainees. The data itself is retrieved directly from NCBI's PubMed and is automatically updated on a weekly basis to ensure accuracy and completeness.
If you have any questions or comments, please contact us.
Structures of biomolecular systems are increasingly computed by integrative modeling. In this approach, a structural model is constructed by combining information from multiple sources, including varied experimental methods and prior models. In 2019, a Workshop was held as a Biophysical Society Satellite Meeting to assess progress and discuss further requirements for archiving integrative structures. The primary goal of the Workshop was to build consensus for addressing the challenges involved in creating common data standards, building methods for federated data exchange, and developing mechanisms for validating integrative structures. The summary of the Workshop and the recommendations that emerged are presented here.
Copyright © 2019.
State-of-the-art strategies for proteomics are not able to rapidly interrogate complex peptide mixtures in an untargeted manner with sensitive peptide and protein identification rates. We describe a data-independent acquisition (DIA) approach, microDIA (μDIA), that applies a novel tandem mass spectrometry (MS/MS) mass spectral deconvolution method to increase the specificity of tandem mass spectra acquired during proteomics experiments. Using the μDIA approach with a 10 min liquid chromatography gradient allowed detection of 3.1-fold more HeLa proteins than the results obtained from data-dependent acquisition (DDA) of the same samples. Additionally, we found the μDIA MS/MS deconvolution procedure is critical for resolving modified peptides with relatively small precursor mass shifts that cause the same peptide sequence in modified and unmodified forms to theoretically cofragment in the same raw MS/MS spectra. The μDIA workflow is implemented in the PROTALIZER software tool which fully automates tandem mass spectral deconvolution, queries every peptide with a library-free search algorithm against a user-defined protein database, and confidently identifies multiple peptides in a single tandem mass spectrum. We also benchmarked μDIA against DDA using a 90 min gradient analysis of HeLa and Escherichia coli peptides that were mixed in predefined quantitative ratios, and our results showed μDIA provided 24% more true positives at the same false positive rate.
Partial covalent interactions (PCIs) in proteins, which include hydrogen bonds, salt bridges, cation-π, and π-π interactions, contribute to thermodynamic stability and facilitate interactions with other biomolecules. Several score functions have been developed within the Rosetta protein modeling framework that identify and evaluate these PCIs through analyzing the geometry between participating atoms. However, we hypothesize that PCIs can be unified through a simplified electron orbital representation. To test this hypothesis, we have introduced orbital based chemical descriptors for PCIs into Rosetta, called the PCI score function. Optimal geometries for the PCIs are derived from a statistical analysis of high-quality protein structures obtained from the Protein Data Bank (PDB), and the relative orientation of electron deficient hydrogen atoms and electron-rich lone pair or π orbitals are evaluated. We demonstrate that nativelike geometries of hydrogen bonds, salt bridges, cation-π, and π-π interactions are recapitulated during minimization of protein conformation. The packing density of tested protein structures increased from the standard score function from 0.62 to 0.64, closer to the native value of 0.70. Overall, rotamer recovery improved when using the PCI score function (75%) as compared to the standard Rosetta score function (74%). The PCI score function represents an improvement over the standard Rosetta score function for protein model scoring; in addition, it provides a platform for future directions in the analysis of small molecule to protein interactions, which depend on partial covalent interactions.
BACKGROUND AND AIMS - Proteomics holds promise for individualizing cancer treatment. We analyzed to what extent the proteomic landscape of human colorectal cancer (CRC) is maintained in established CRC cell lines and the utility of proteomics for predicting therapeutic responses.
METHODS - Proteomic and transcriptomic analyses were performed on 44 CRC cell lines, compared against primary CRCs (n=95) and normal tissues (n=60), and integrated with genomic and drug sensitivity data.
RESULTS - Cell lines mirrored the proteomic aberrations of primary tumors, in particular for intrinsic programs. Tumor relationships of protein expression with DNA copy number aberrations and signatures of post-transcriptional regulation were recapitulated in cell lines. The 5 proteomic subtypes previously identified in tumors were represented among cell lines. Nonetheless, systematic differences between cell line and tumor proteomes were apparent, attributable to stroma, extrinsic signaling, and growth conditions. Contribution of tumor stroma obscured signatures of DNA mismatch repair identified in cell lines with a hypermutation phenotype. Global proteomic data showed improved utility for predicting both known drug-target relationships and overall drug sensitivity as compared with genomic or transcriptomic measurements. Inhibition of targetable proteins associated with drug responses further identified corresponding synergistic or antagonistic drug combinations. Our data provide evidence for CRC proteomic subtype-specific drug responses.
CONCLUSIONS - Proteomes of established CRC cell line are representative of primary tumors. Proteomic data tend to exhibit improved prediction of drug sensitivity as compared with genomic and transcriptomic profiles. Our integrative proteogenomic analysis highlights the potential of proteome profiling to inform personalized cancer medicine.
Copyright © 2017 AGA Institute. Published by Elsevier Inc. All rights reserved.
Over time, the long-lived proteins that are present throughout the human body deteriorate. Typically, they become racemized, truncated, and covalently cross-linked. One reaction responsible for age-related protein cross-linking in the lens was elucidated recently and shown to involve spontaneous formation of dehydroalanine (DHA) intermediates from phosphoserine. Cys residues are another potential source of DHA, and evidence for this was found in many lens crystallins. In the human lens, some sites were more prone to forming non-disulfide covalent cross-links than others. Foremost among them was Cys5 in βA4 crystallin. The reason for this enhanced reactivity was investigated using peptides. Oxidation of Cys to cystine was a prerequisite for DHA formation, and DHA production was accelerated markedly by the presence of a Lys, one residue separated from Cys5. Modeling and direct investigation of the N-terminal sequence of βA4 crystallin, as well as a variety of homologous peptides, showed that the epsilon amino group of Lys can promote DHA production by nucleophilic attack on the alpha proton of cystine. Once a DHA residue was generated, it could form intermolecular cross-links with Lys and Cys. In the lens, the most abundant cross-link involved Cys5 of βA4 crystallin attached via a thioether bond to glutathione. These findings illustrate the potential of Cys and disulfide bonds to act as precursors for irreversible covalent cross-links and the role of nearby amino acids in creating 'hotpsots' for the spontaneous processes responsible for protein degradation in aged tissues.
© 2017 The Author(s); published by Portland Press Limited on behalf of the Biochemical Society.
To facilitate genome-based representation and analysis of proteomics data, we developed a new bioinformatics framework, proBAMsuite, in which a central component is the protein BAM (proBAM) file format for organizing peptide spectrum matches (PSMs)(1) within the context of the genome. proBAMsuite also includes two R packages, proBAMr and proBAMtools, for generating and analyzing proBAM files, respectively. Applying proBAMsuite to three recently published proteomics datasets, we demonstrated its utility in facilitating efficient genome-based sharing, interpretation, and integration of proteomics data. First, the interpretation of proteomics data is significantly enhanced with the rich genomic annotation information. Second, PSMs can be easily reannotated using user-specified gene annotation schemes and assembled into both protein and gene identifications. Third, using the genome as a common reference, proBAMsuite facilitates seamless proteomics and proteogenomics data integration. Finally, proBAM files can be readily visualized in genome browsers and thus bring proteomics data analysis to a general audience beyond the proteomics community. Results from this study establish proBAMsuite as a useful bioinformatics framework for proteomics and proteogenomics research.
© 2016 by The American Society for Biochemistry and Molecular Biology, Inc.
Next-generation sequencing has transformed the ability to link genotypes to phenotypes and facilitates the dissection of genetic contribution to complex traits. However, it is challenging to link genetic variants with the perturbed functional effects on proteins encoded by such genes. Here we show how RNA sequencing can be exploited to construct genotype-specific protein sequence databases to assess natural variation in proteins, providing information about the molecular toolbox driving cellular processes. For this study, we used two natural genotypes selected from a recent genome-wide association study of Populus trichocarpa, an obligate outcrosser with tremendous phenotypic variation across the natural population. This strategy allowed us to comprehensively catalogue proteins containing single amino acid polymorphisms (SAAPs), as well as insertions and deletions. We profiled the frequency of 128 types of naturally occurring amino acid substitutions, including both expected (neutral) and unexpected (non-neutral) SAAPs, with a subset occurring in regions of the genome having strong polymorphism patterns consistent with recent positive and/or divergent selection. By zeroing in on the molecular signatures of these important regions that might have previously been uncharacterized, we now provide a high-resolution molecular inventory that should improve accessibility and subsequent identification of natural protein variants in future genotype-to-phenotype studies.
SEQUEST is a database-searching engine, which calculates the correlation score between observed spectrum and theoretical spectrum deduced from protein sequences stored in a flat text file, even though it is not a relational and object-oriental repository. Nevertheless, the SEQUEST score functions fail to discriminate between true and false PSMs accurately. Some approaches, such as PeptideProphet and Percolator, have been proposed to address the task of distinguishing true and false PSMs. However, most of these methods employ time-consuming learning algorithms to validate peptide assignments  . In this paper, we propose a fast algorithm for validating peptide identification by incorporating heterogeneous information from SEQUEST scores and peptide digested knowledge. To automate the peptide identification process and incorporate additional information, we employ l2 multiple kernel learning (MKL) to implement the current peptide identification task. Results on experimental datasets indicate that compared with state-of-the-art methods, i.e., PeptideProphet and Percolator, our data fusing strategy has comparable performance but reduces the running time significantly.
The two key steps for analyzing proteomic data generated by high-resolution MS are database searching and postprocessing. While the two steps are interrelated, studies on their combinatory effects and the optimization of these procedures have not been adequately conducted. Here, we investigated the performance of three popular search engines (SEQUEST, Mascot, and MS Amanda) in conjunction with five filtering approaches, including respective score-based filtering, a group-based approach, local false discovery rate (LFDR), PeptideProphet, and Percolator. A total of eight data sets from various proteomes (e.g., E. coli, yeast, and human) produced by various instruments with high-accuracy survey scan (MS1) and high- or low-accuracy fragment ion scan (MS2) (LTQ-Orbitrap, Orbitrap-Velos, Orbitrap-Elite, Q-Exactive, Orbitrap-Fusion, and Q-TOF) were analyzed. It was found combinations involving Percolator achieved markedly more peptide and protein identifications at the same FDR level than the other 12 combinations for all data sets. Among these, combinations of SEQUEST-Percolator and MS Amanda-Percolator provided slightly better performances for data sets with low-accuracy MS2 (ion trap or IT) and high accuracy MS2 (Orbitrap or TOF), respectively, than did other methods. For approaches without Percolator, SEQUEST-group performs the best for data sets with MS2 produced by collision-induced dissociation (CID) and IT analysis; Mascot-LFDR gives more identifications for data sets generated by higher-energy collisional dissociation (HCD) and analyzed in Orbitrap (HCD-OT) and in Orbitrap Fusion (HCD-IT); MS Amanda-Group excels for the Q-TOF data set and the Orbitrap Velos HCD-OT data set. Therefore, if Percolator was not used, a specific combination should be applied for each type of data set. Moreover, a higher percentage of multiple-peptide proteins and lower variation of protein spectral counts were observed when analyzing technical replicates using Percolator-associated combinations; therefore, Percolator enhanced the reliability for both identification and quantification. The analyses were performed using the specific programs embedded in Proteome Discoverer, Scaffold, and an in-house algorithm (BuildSummary). These results provide valuable guidelines for the optimal interpretation of proteomic results and the development of fit-for-purpose protocols under different situations.
Understanding proteomic differences underlying the different phenotypic classes of colon and rectal carcinoma is important and may eventually lead to a better assessment of clinical behavior of these cancers. We here present a comprehensive description of the proteomic data obtained from 90 colon and rectal carcinomas previously subjected to genomic analysis by The Cancer Genome Atlas (TCGA). Here, the primary instrument files and derived secondary data files are compiled and presented in forms that will allow further analyses of the biology of colon and rectal carcinoma. We also discuss new challenges in processing these large proteomic datasets for relevant proteins and protein variants.