The publication data currently available has been vetted by Vanderbilt faculty, staff, administrators and trainees. The data itself is retrieved directly from NCBI's PubMed and is automatically updated on a weekly basis to ensure accuracy and completeness.
If you have any questions or comments, please contact us.
The seminal importance of DNA sequencing to the life sciences, biotechnology and medicine has driven the search for more scalable and lower-cost solutions. Here we describe a DNA sequencing technology in which scalable, low-cost semiconductor manufacturing techniques are used to make an integrated circuit able to directly perform non-optical DNA sequencing of genomes. Sequence data are obtained by directly sensing the ions produced by template-directed DNA polymerase synthesis using all-natural nucleotides on this massively parallel semiconductor-sensing device or ion chip. The ion chip contains ion-sensitive, field-effect transistor-based sensors in perfect register with 1.2 million wells, which provide confinement and allow parallel, simultaneous detection of independent sequencing reactions. Use of the most widely used technology for constructing integrated circuits, the complementary metal-oxide semiconductor (CMOS) process, allows for low-cost, large-scale production and scaling of the device to higher densities and larger array sizes. We show the performance of the system by sequencing three bacterial genomes, its robustness and scalability by producing ion chips with up to 10 times as many sensors and sequencing a human genome.
Elucidation of protein-protein interactions can provide new knowledge on protein function. Enrichments of affinity-tagged (or "bait") proteins with interaction partners generally include background, nonspecific protein artifacts. Furthermore, in vivo bait expression may introduce additional artifacts arising from altered physiology or metabolism. In this study, we compared these effects for chromosome and plasmid encoding strategies for bait proteins in two microbes: Escherichia coli and Rhodopseudomonas palustris. Differential metabolic labeling of strains expressing bait protein relative to the wild-type strain in each species allowed comparison by liquid chromatography tandem mass spectrometry (LC-MS-MS). At the local level of the protein complex, authentic interacting proteins of RNA polymerase (RNAP) were successfully discerned from artifactual proteins by the isotopic differentiation of interactions as random or targeted (I-DIRT, Tackett, A. J.; et al. J. Proteome Res. 2005, 4, 1752-1756). To investigate global effects of bait protein production, we compared proteomes from strains harboring a plasmid encoding an affinity-tagged subunit (RpoA) of RNAP with the corresponding wild-type strains. The RpoA abundance ratios of 0.8 for R. palustris and 1.7 for E. coli in plasmid strains versus wild-type indicated only slightly altered expression. While most other proteins also showed no appreciable difference in abundance, several that did show altered levels were involved in amino acid metabolism. Measurements at both local and global levels proved useful for evaluating in vitro and in vivo artifacts of plasmid-encoding strategies for bait protein expression.
One of the most promising methods for large-scale studies of protein interactions is isolation of an affinity-tagged protein with its in vivo interaction partners, followed by mass spectrometric identification of the copurified proteins. Previous studies have generated affinity-tagged proteins using genetic tools or cloning systems that are specific to a particular organism. To enable protein-protein interaction studies across a wider range of Gram-negative bacteria, we have developed a methodology based on expression of affinity-tagged "bait" proteins from a medium copy-number plasmid. This construct is based on a broad-host-range vector backbone (pBBR1MCS5). The vector has been modified to incorporate the Gateway DEST vector recombination region, to facilitate cloning and expression of fusion proteins bearing a variety of affinity, fluorescent, or other tags. We demonstrate this methodology by characterizing interactions among subunits of the DNA-dependent RNA polymerase complex in two metabolically versatile Gram-negative microbial species of environmental interest, Rhodopseudomonas palustris CGA010 and Shewanella oneidensis MR-1. Results compared favorably with those for both plasmid and chromosomally encoded affinity-tagged fusion proteins expressed in a model organism, Escherichia coli.
Affinity isolation of protein complexes followed by protein identification by LC-MS/MS is an increasingly popular approach for mapping protein interactions. However, systematic and random assay errors from multiple sources must be considered to confidently infer authentic protein-protein interactions. To address this issue, we developed a general, robust statistical method for inferring authentic interactions from protein prey-by-bait frequency tables using a binomial-based likelihood ratio test (LRT) coupled with Bayes' Odds estimation. We then applied our LRT-Bayes' algorithm experimentally using data from protein complexes isolated from Rhodopseudomonas palustris. Our algorithm, in conjunction with the experimental protocol, inferred with high confidence authentic interacting proteins from abundant, stable complexes, but few or no authentic interactions for lower-abundance complexes. The algorithm can discriminate against a background of prey proteins that are detected in association with a large number of baits as an artifact of the measurement. We conclude that the experimental protocol including the LRT-Bayes' algorithm produces results with high confidence but moderate sensitivity. We also found that Monte Carlo simulation is a feasible tool for checking modeling assumptions, estimating parameters, and evaluating the significance of results in protein association studies.
While genome sequencing is becoming ever more routine, genome annotation remains a challenging process. Identification of the coding sequences within the genomic milieu presents a tremendous challenge, especially for eukaryotes with their complex gene architectures. Here, we present a method to assist the annotation process through the use of proteomic data and bioinformatics. Mass spectra of digested protein preparations of the organism of interest were acquired and searched against a protein database created by a six-frame translation of the genome. The identified peptides were mapped back to the genome, compared to the current annotation, and then categorized as supporting or extending the current genome annotation. We named the classified peptides Expressed Peptide Tags (EPTs). The well-annotated bacterium Rhodopseudomonas palustris was used as a control for the method and showed a high degree of correlation between EPT mapping and the current annotation, with 86% of the EPTs confirming existing gene calls and less than 1% of the EPTs expanding on the current annotation. The eukaryotic plant pathogens Phytophthora ramorum and Phytophthora sojae, whose genomes have been recently sequenced and are much less well-annotated, were also subjected to this method. A series of algorithmic steps were taken to increase the confidence of EPT identification for these organisms, including generation of smaller subdatabases to be searched against, and definition of EPT criteria that accommodates the more complex eukaryotic gene architecture. As expected, the analysis of the Phytophthora species showed less correlation between EPT mapping and their current annotation. While approximately 76% of Phytophthora EPTs supported the current annotation, a portion of them (7.7% and 12.9% for P. ramorum and P. sojae, respectively) suggested modification to current gene calls or identified novel genes that were missed by the current genome annotation of these organisms.
A profile likelihood algorithm is proposed for quantitative shotgun proteomics to infer the abundance ratios of proteins from the abundance ratios of isotopically labeled peptides derived from proteolysis. Previously, we have shown that the estimation variability and bias of peptide abundance ratios can be predicted from their profile signal-to-noise ratios. Given multiple quantified peptides for a protein, the profile likelihood algorithm probabilistically weighs the peptide abundance ratios by their inferred estimation variability, accounts for their expected estimation bias, and suppresses contribution from outliers. This algorithm yields maximum likelihood point estimation and profile likelihood confidence interval estimation of protein abundance ratios. This point estimator is more accurate than an estimator based on the average of peptide abundance ratios. The confidence interval estimation provides an "error bar" for each protein abundance ratio that reflects its estimation precision and statistical uncertainty. The accuracy of the point estimation and the precision and confidence level of the interval estimation were benchmarked with standard mixtures of isotopically labeled proteomes. The profile likelihood algorithm was integrated into a quantitative proteomics program, called ProRata, freely available at www.MSProRata.org.
The abundance ratio between the light and heavy isotopologues of an isotopically labeled peptide can be estimated from their selected ion chromatograms. However, quantitative shotgun proteomics measurements yield selected ion chromatograms at highly variable signal-to-noise ratios for tens of thousands of peptides. This challenge calls for algorithms that not only robustly estimate the abundance ratios of different peptides but also rigorously score each abundance ratio for the expected estimation bias and variability. Scoring of the abundance ratios, much like scoring of sequence assignment for tandem mass spectra by peptide identification algorithms, enables filtering of unreliable peptide quantification and use of formal statistical inference in the subsequent protein abundance ratio estimation. In this study, a parallel paired covariance algorithm is used for robust peak detection in selected ion chromatograms. A peak profile is generated for each peptide, which is a scatterplot of ion intensities measured for the two isotopologues within their chromatographic peaks. Principal component analysis of the peak profile is proposed to estimate the peptide abundance ratio and to score the estimation with the signal-to-noise ratio of the peak profile (profile signal-to-noise ratio). We demonstrate that the profile signal-to-noise ratio is inversely correlated with the variability and bias of peptide abundance ratio estimation.
Rhodopseudomonas palustris is a purple nonsulfur anoxygenic phototrophic bacterium that is ubiquitous in soil and water. R. palustris is metabolically versatile with respect to energy generation and carbon and nitrogen metabolism. We have characterized and compared the baseline proteome of a R. palustris wild-type strain grown under six metabolic conditions. The methodology for proteome analysis involved protein fractionation by centrifugation, subsequent digestion with trypsin, and analysis of peptides by liquid chromatography coupled with tandem mass spectrometry. Using these methods, we identified 1664 proteins out of 4836 predicted proteins with conservative filtering constraints. A total of 107 novel hypothetical proteins and 218 conserved hypothetical proteins were detected. Qualitative analyses revealed over 311 proteins exhibiting marked differences between conditions, many of these being hypothetical or conserved hypothetical proteins showing strong correlations with different metabolic modes. For example, five proteins encoded by genes from a novel operon appeared only after anaerobic growth with no evidence of these proteins in extracts of aerobically grown cells. Proteins known to be associated with specialized growth states such as nitrogen fixation, photoautotrophic, or growth on benzoate, were observed to be up-regulated under those states.
Algorithmic search engines bridge the gap between large tandem mass spectrometry data sets and the identification of proteins associated with biological samples. Improvements in these tools can greatly enhance biological discovery. We present a new scoring scheme for comparing tandem mass spectra with a protein sequence database. The MASPIC (Multinomial Algorithm for Spectral Profile-based Intensity Comparison) scorer converts an experimental tandem mass spectrum into a m/z profile of probability and then scores peak lists from potential candidate peptides using a multinomial distribution model. The MASPIC scoring scheme incorporates intensity, spectral peak density variations, and m/z error distribution associated with peak matches into a multinomial distribution. The scoring scheme was validated on two standard protein mixtures and an additional set of spectra collected on a complex ribosomal protein mixture from Rhodopseudomonas palustris. The results indicate a 5-15% improvement over Sequest for high-confidence identifications. The performance gap grows as sequence database size increases. Additional tests on spectra from proteinase-K digest data showed similar performance improvements demonstrating the advantages in using MASPIC for studying proteins digested with less specific proteases. All these investigations show MASPIC to be a versatile and reliable system for peptide tandem mass spectral identification.
We present a comprehensive mass spectrometric approach that integrates intact protein molecular mass measurement ("top-down") and proteolytic fragment identification ("bottom-up") to characterize the 70S ribosome from Rhodopseudomonas palustris. Forty-two intact protein identifications were obtained by the top-down approach and 53 out of the 54 orthologs to Escherichia coli ribosomal proteins were identified from bottom-up analysis. This integrated approach simplified the assignment of post-translational modifications by increasing the confidence of identifications, distinguishing between isoforms, and identifying the amino acid positions at which particular post-translational modifications occurred. Our combined mass spectrometry data also allowed us to check and validate the gene annotations for three ribosomal proteins predicted to possess extended C-termini. In particular, we identified a highly repetitive C-terminal "alanine tail" on L25. This type of low complexity sequence, common to eukaryotic proteins, has previously not been reported in prokaryotic proteins. To our knowledge, this is the most comprehensive protein complex analysis to date that integrates two MS techniques.