Home Contact Chinese CAS
Home  About Us    Research     People   International Cooperation   News     Papers   Education & Training  Join Us
Location: Home > Research > Research Progress
Modeling RNA degradation for RNA-Seq with applications
 Date:18-10-2012 Page Views:
Print
Text Size: A A A
Close

 

Modeling RNA degradation  for RNA-Seq with applications

LIN WAN a, XITING YAN, TING CHEN b, FENGZHU SUN *c

a Molecular and Computational Biology Program, University of Southern California, Los Angeles,
CA 90089, USA and Academy of Mathematics and Systems Science, Chinese Academy of Sciences,
Beijing 100190, People’s Republic of China

b Department of Epidemiology and Public Health, Yale University, New Haven, CT 06520, USA

c Molecular and Computational Biology Program, University of Southern California,Los Angeles, CA 90089, USA and Tsinghua National Laboratory for Information Science and Technology/Department of Automation, Tsinghua University,
Beijing 100084, People’s Republic of China fsun@usc.edu 

 

Abstract

 
RNA-Seq is widely used in biological and biomedical studies. Methods for the estimation of the
transcript’s abundance using RNA-Seq data have been intensively studied, many of which are based on the
assumption that the short-reads of RNA-Seq are uniformly distributed along the transcripts. However,
the short-reads are found to be nonuniformly distributed along the transcripts, which can greatly reduce
the accuracies of these methods based on the uniform assumption. Several methods are developed to
adjust the biases induced by this nonuniformity, utilizing the short-read’s empirical distribution in transcript.
As an alternative, we found that RNA degradation plays a major role in the formation of the
short-read’s nonuniform distribution and thus developed a new approach that quantifies the short-read’s
nonuniform distribution by precisely modeling RNA degradation. Our model of RNA degradation fits
RNA-Seq data quite well, and based on this model, a new statistical method was further developed to
estimate transcript expression level, as well as the RNA degradation rate, for individual genes and their
isoforms. We showed that our method can improve the accuracy of transcript isoform expression estimation.
The RNA degradation rate of individual transcript we estimated is consistent across samples and/or
experiments/platforms. In addition, the RNA degradation rate from our model is independent of the RNA
length, consistent with previous studies on RNA decay rate.
Keywords: EM algorithm; Gene expression; Next generation sequencing; RNA degradation; RNA-Seq.

Biostatistics

 2012

[ Close ]  [ Top ]
  Copyright © 2012, All Rights Reserved, National Center for Mathematics and Interdisciplinary Sciences, CAS
Tel: 86-10-62613242 Fax: 86-10-62616840 E-mail: ncmis@amss.ac.cn