Site identification in high-throughput RNA-protein interaction data.

Uren PJ, Bahrami-Samani E, Burns SC, Qiao M, Karginov FV, Hodges E, Hannon GJ, Sanford JR, Penalva LO, Smith AD
Bioinformatics. 2012 28 (23): 3013-20

PMID: 23024010 · PMCID: PMC3509493 · DOI:10.1093/bioinformatics/bts569

MOTIVATION - Post-transcriptional and co-transcriptional regulation is a crucial link between genotype and phenotype. The central players are the RNA-binding proteins, and experimental technologies [such as cross-linking with immunoprecipitation- (CLIP-) and RIP-seq] for probing their activities have advanced rapidly over the course of the past decade. Statistically robust, flexible computational methods for binding site identification from high-throughput immunoprecipitation assays are largely lacking however.

RESULTS - We introduce a method for site identification which provides four key advantages over previous methods: (i) it can be applied on all variations of CLIP and RIP-seq technologies, (ii) it accurately models the underlying read-count distributions, (iii) it allows external covariates, such as transcript abundance (which we demonstrate is highly correlated with read count) to inform the site identification process and (iv) it allows for direct comparison of site usage across cell types or conditions.

AVAILABILITY AND IMPLEMENTATION - We have implemented our method in a software tool called Piranha. Source code and binaries, licensed under the GNU General Public License (version 3) are freely available for download from


SUPPLEMENTARY INFORMATION - Supplementary data available at Bioinformatics online.

MeSH Terms (11)

Base Sequence Binding Sites Computational Biology HEK293 Cells HeLa Cells High-Throughput Nucleotide Sequencing Humans RNA RNA-Binding Proteins Sequence Analysis, RNA Software

Connections (1)

This publication is referenced by other Labnodes entities: