TY - JOUR
T1 - Computational identification of noncoding RNAs in E. coli by comparative genomics
AU - Rivas, Elena
AU - Klein, Robert J.
AU - Jones, Thomas A.
AU - Eddy, Sean R.
N1 - Funding Information:
This work was supported by the Howard Hughes Medical Institute (HHMI), the National Institutes of Health National Human Genome Research Institute, a Sloan Foundation postdoctoral fellowship to E.R., and an HHMI graduate fellowship to R.J.K.
PY - 2001/9/4
Y1 - 2001/9/4
N2 - Some genes produce noncoding transcripts that function directly as structural, regulatory, or even catalytic RNAs [1, 2]. Unlike protein-coding genes, which can be detected as open reading frames with distinctive statistical biases, noncoding RNA (ncRNA) gene sequences have no obvious inherent statistical biases [3]. Thus, genome sequence analyses reveal novel protein-coding genes, but any novel ncRNA genes remain invisible. Here, we describe a computational comparative genomic screen for ncRNA genes. The key idea is to distinguish conserved RNA secondary structures from a background of other conserved sequences using probabilistic models of expected mutational patterns in pairwise sequence alignments. We report the first whole-genome screen for ncRNA genes done with this method, in which we applied it to the "intergenic" spacers of Escherichia coli using comparative sequence data from four related bacteria. Starting from >23,000 conserved interspecies pairwise alignments, the screen predicted 275 candidate structural RNA loci. A sample of 49 candidate loci was assayed experimentally. At least 11 loci expressed small, apparently noncoding RNA transcripts of unknown function. Our computational approach may be used to discover structural ncRNA genes in any genome for which appropriate comparative genome sequence data are available.
AB - Some genes produce noncoding transcripts that function directly as structural, regulatory, or even catalytic RNAs [1, 2]. Unlike protein-coding genes, which can be detected as open reading frames with distinctive statistical biases, noncoding RNA (ncRNA) gene sequences have no obvious inherent statistical biases [3]. Thus, genome sequence analyses reveal novel protein-coding genes, but any novel ncRNA genes remain invisible. Here, we describe a computational comparative genomic screen for ncRNA genes. The key idea is to distinguish conserved RNA secondary structures from a background of other conserved sequences using probabilistic models of expected mutational patterns in pairwise sequence alignments. We report the first whole-genome screen for ncRNA genes done with this method, in which we applied it to the "intergenic" spacers of Escherichia coli using comparative sequence data from four related bacteria. Starting from >23,000 conserved interspecies pairwise alignments, the screen predicted 275 candidate structural RNA loci. A sample of 49 candidate loci was assayed experimentally. At least 11 loci expressed small, apparently noncoding RNA transcripts of unknown function. Our computational approach may be used to discover structural ncRNA genes in any genome for which appropriate comparative genome sequence data are available.
UR - http://www.scopus.com/inward/record.url?scp=0035806973&partnerID=8YFLogxK
U2 - 10.1016/S0960-9822(01)00401-8
DO - 10.1016/S0960-9822(01)00401-8
M3 - Article
C2 - 11553332
AN - SCOPUS:0035806973
SN - 0960-9822
VL - 11
SP - 1369
EP - 1373
JO - Current Biology
JF - Current Biology
IS - 17
ER -