TY - JOUR
T1 - Characterization of the human ESC transcriptome by hybrid sequencing
AU - Au, Kin Fai
AU - Sebastiano, Vittorio
AU - Afshar, Pegah Tootoonchi
AU - Durruthy, Jens Durruthy
AU - Lee, Lawrence
AU - Williams, Brian A.
AU - Van Bakel, Harm
AU - Schadt, Eric E.
AU - Reijo-Pera, Renee A.
AU - Underwood, Jason G.
AU - Wong, Wing Hung
PY - 2013/12/10
Y1 - 2013/12/10
N2 - Although transcriptional and posttranscriptional events are detected in RNA-Seq data from second-generation sequencing, full-length mRNA isoforms are not captured. On the other hand, third-generation sequencing, which yields much longer reads, has current limitations of lower raw accuracy and throughput. Here, we combine second-generation sequencing and third-generation sequencing with a custom-designed method for isoform identification and quantification to generate a high-confidence isoform dataset for human embryonic stem cells (hESCs). We report 8,084 RefSeq-annotated isoforms detected as full-length and an additional 5,459 isoforms predicted through statistical inference. Over one-third of these are novel isoforms, including 273 RNAs from gene loci that have not previously been identified. Further characterization of the novel loci indicates that a subset is expressed in pluripotent cells but not in diverse fetal and adult tissues; moreover, their reduced expression perturbs the network of pluripotency-associated genes. Results suggest that gene identification, even in well-characterized human cell lines and tissues, is likely far from complete.
AB - Although transcriptional and posttranscriptional events are detected in RNA-Seq data from second-generation sequencing, full-length mRNA isoforms are not captured. On the other hand, third-generation sequencing, which yields much longer reads, has current limitations of lower raw accuracy and throughput. Here, we combine second-generation sequencing and third-generation sequencing with a custom-designed method for isoform identification and quantification to generate a high-confidence isoform dataset for human embryonic stem cells (hESCs). We report 8,084 RefSeq-annotated isoforms detected as full-length and an additional 5,459 isoforms predicted through statistical inference. Over one-third of these are novel isoforms, including 273 RNAs from gene loci that have not previously been identified. Further characterization of the novel loci indicates that a subset is expressed in pluripotent cells but not in diverse fetal and adult tissues; moreover, their reduced expression perturbs the network of pluripotency-associated genes. Results suggest that gene identification, even in well-characterized human cell lines and tissues, is likely far from complete.
KW - Alternative splicing
KW - Isoform discovery
KW - LncNRA
KW - PacBio
KW - hESC transcriptome
UR - https://www.scopus.com/pages/publications/84890286068
U2 - 10.1073/pnas.1320101110
DO - 10.1073/pnas.1320101110
M3 - Article
AN - SCOPUS:84890286068
SN - 0027-8424
VL - 110
SP - E4821-E4830
JO - Proceedings of the National Academy of Sciences of the United States of America
JF - Proceedings of the National Academy of Sciences of the United States of America
IS - 50
ER -