, a bio/informatics shared resource is still "open for business" - Visit the CDS website

Jane Ferguson
Last active: 3/3/2020

PennSeq: accurate isoform-specific gene expression quantification in RNA-Seq by modeling non-uniform read distribution.

Hu Y, Liu Y, Mao X, Jia C, Ferguson JF, Xue C, Reilly MP, Li H, Li M
Nucleic Acids Res. 2014 42 (3): e20

PMID: 24362841 · PMCID: PMC3919567 · DOI:10.1093/nar/gkt1304

Correctly estimating isoform-specific gene expression is important for understanding complicated biological mechanisms and for mapping disease susceptibility genes. However, estimating isoform-specific gene expression is challenging because various biases present in RNA-Seq (RNA sequencing) data complicate the analysis, and if not appropriately corrected, can affect isoform expression estimation and downstream analysis. In this article, we present PennSeq, a statistical method that allows each isoform to have its own non-uniform read distribution. Instead of making parametric assumptions, we give adequate weight to the underlying data by the use of a non-parametric approach. Our rationale is that regardless what factors lead to non-uniformity, whether it is due to hexamer priming bias, local sequence bias, positional bias, RNA degradation, mapping bias or other unknown reasons, the probability that a fragment is sampled from a particular region will be reflected in the aligned data. This empirical approach thus maximally reflects the true underlying non-uniform read distribution. We evaluate the performance of PennSeq using both simulated data with known ground truth, and using two real Illumina RNA-Seq data sets including one with quantitative real time polymerase chain reaction measurements. Our results indicate superior performance of PennSeq over existing methods, particularly for isoforms demonstrating severe non-uniformity. PennSeq is freely available for download at http://sourceforge.net/projects/pennseq.

MeSH Terms (9)

Adipose Tissue Female Gene Expression Profiling Humans Models, Statistical Oligonucleotide Array Sequence Analysis RNA Isoforms Sequence Analysis, RNA Statistics, Nonparametric

Connections (1)

This publication is referenced by other Labnodes entities: