TY - JOUR
T1 - Flexible analysis of RNA-seq data using mixed effects models
AU - Turro, Ernest
AU - Astle, William J.
AU - Tavaré, Simon
N1 - Funding Information:
Funding: E.T. and S.T. were funded by Cancer Research UK grant C14303/A10825 and E.T. was further funded by the Cambridge Biomedical Research Centre. W.J.A. was funded by UK BBSRC grant BB/E020372/1 and a Team Grant from the Fonds de recherche du Québec—Nature et technologies.
PY - 2014/1
Y1 - 2014/1
N2 - Motivation: Most methods for estimating differential expression from RNA-seq are based on statistics that compare normalized read counts between treatment classes. Unfortunately, reads are in general too short to be mapped unambiguously to features of interest, such as genes, isoforms or haplotype-specific isoforms. There are methods for estimating expression levels that account for this source of ambiguity. However, the uncertainty is not generally accounted for in downstream analysis of gene expression experiments. Moreover, at the individual transcript level, it can sometimes be too large to allow useful comparisons between treatment groups.Results: In this article we make two proposals that improve the power, specificity and versatility of expression analysis using RNA-seq data. First, we present a Bayesian method for model selection that accounts for read mapping ambiguities using random effects. This polytomous model selection approach can be used to identify many interesting patterns of gene expression and is not confined to detecting differential expression between two groups. For illustration, we use our method to detect imprinting, different types of regulatory divergence in cis and in trans and differential isoform usage, but many other applications are possible. Second, we present a novel collapsing algorithm for grouping transcripts into inferential units that exploits the posterior correlation between transcript expression levels. The aggregate expression levels of these units can be estimated with useful levels of uncertainty. Our algorithm can improve the precision of expression estimates when uncertainty is large with only a small reduction in biological resolution.Availability and implementation: We have implemented our software in the mmdiff and mmcollapse multithreaded C++ programs as part of the open-source MMSEQ package, available on https://github.com/ eturro/mmseq.Contact: Supplementary information: Supplementary data are available at Bioinformatics online.
AB - Motivation: Most methods for estimating differential expression from RNA-seq are based on statistics that compare normalized read counts between treatment classes. Unfortunately, reads are in general too short to be mapped unambiguously to features of interest, such as genes, isoforms or haplotype-specific isoforms. There are methods for estimating expression levels that account for this source of ambiguity. However, the uncertainty is not generally accounted for in downstream analysis of gene expression experiments. Moreover, at the individual transcript level, it can sometimes be too large to allow useful comparisons between treatment groups.Results: In this article we make two proposals that improve the power, specificity and versatility of expression analysis using RNA-seq data. First, we present a Bayesian method for model selection that accounts for read mapping ambiguities using random effects. This polytomous model selection approach can be used to identify many interesting patterns of gene expression and is not confined to detecting differential expression between two groups. For illustration, we use our method to detect imprinting, different types of regulatory divergence in cis and in trans and differential isoform usage, but many other applications are possible. Second, we present a novel collapsing algorithm for grouping transcripts into inferential units that exploits the posterior correlation between transcript expression levels. The aggregate expression levels of these units can be estimated with useful levels of uncertainty. Our algorithm can improve the precision of expression estimates when uncertainty is large with only a small reduction in biological resolution.Availability and implementation: We have implemented our software in the mmdiff and mmcollapse multithreaded C++ programs as part of the open-source MMSEQ package, available on https://github.com/ eturro/mmseq.Contact: Supplementary information: Supplementary data are available at Bioinformatics online.
UR - http://www.scopus.com/inward/record.url?scp=84892702660&partnerID=8YFLogxK
U2 - 10.1093/bioinformatics/btt624
DO - 10.1093/bioinformatics/btt624
M3 - Article
C2 - 24281695
AN - SCOPUS:84892702660
SN - 1367-4803
VL - 30
SP - 180
EP - 188
JO - Bioinformatics
JF - Bioinformatics
IS - 2
ER -