TY - JOUR
T1 - An improved canine genome and a comprehensive catalogue of coding genes and non-coding transcripts
AU - Hoeppner, Marc P.
AU - Lundquist, Andrew
AU - Pirun, Mono
AU - Meadows, Jennifer R.S.
AU - Zamani, Neda
AU - Johnson, Jeremy
AU - Sundström, Görel
AU - Cook, April
AU - FitzGerald, Michael G.
AU - Swofford, Ross
AU - Mauceli, Evan
AU - Moghadam, Behrooz Torabi
AU - Greka, Anna
AU - Alföldi, Jessica
AU - Abouelleil, Amr
AU - Aftuck, Lynne
AU - Bessette, Daniel
AU - Berlin, Aaron
AU - Brown, Adam
AU - Gearin, Gary
AU - Lui, Annie
AU - Macdonald, J. Pendexter
AU - Priest, Margaret
AU - Shea, Terrance
AU - Turner-Maier, Jason
AU - Zimmer, Andrew
AU - Lander, Eric S.
AU - Di Palma, Federica
AU - Lindblad-Toh, Kerstin
AU - Grabherr, Manfred G.
PY - 2014/3/13
Y1 - 2014/3/13
N2 - The domestic dog, Canis familiaris, is a well-established model system for mapping trait and disease loci. While the original draft sequence was of good quality, gaps were abundant particularly in promoter regions of the genome, negatively impacting the annotation and study of candidate genes. Here, we present an improved genome build, canFam3.1, which includes 85 MB of novel sequence and now covers 99.8% of the euchromatic portion of the genome. We also present multiple RNA-Sequencing data sets from 10 different canine tissues to catalog ∼175,000 expressed loci. While about 90% of the coding genes previously annotated by EnsEMBL have measurable expression in at least one sample, the number of transcript isoforms detected by our data expands the EnsEMBL annotations by a factor of four. Syntenic comparison with the human genome revealed an additional ∼3,000 loci that are characterized as protein coding in human and were also expressed in the dog, suggesting that those were previously not annotated in the EnsEMBL canine gene set. In addition to ∼20,700 high-confidence protein coding loci, we found ∼4,600 antisense transcripts overlapping exons of protein coding genes, ∼7,200 intergenic multi-exon transcripts without coding potential, likely candidates for long intergenic non-coding RNAs (lincRNAs) and ∼11,000 transcripts were reported by two different library construction methods but did not fit any of the above categories. Of the lincRNAs, about 6,000 have no annotated orthologs in human or mouse. Functional analysis of two novel transcripts with shRNA in a mouse kidney cell line altered cell morphology and motility. All in all, we provide a much-improved annotation of the canine genome and suggest regulatory functions for several of the novel non-coding transcripts.
AB - The domestic dog, Canis familiaris, is a well-established model system for mapping trait and disease loci. While the original draft sequence was of good quality, gaps were abundant particularly in promoter regions of the genome, negatively impacting the annotation and study of candidate genes. Here, we present an improved genome build, canFam3.1, which includes 85 MB of novel sequence and now covers 99.8% of the euchromatic portion of the genome. We also present multiple RNA-Sequencing data sets from 10 different canine tissues to catalog ∼175,000 expressed loci. While about 90% of the coding genes previously annotated by EnsEMBL have measurable expression in at least one sample, the number of transcript isoforms detected by our data expands the EnsEMBL annotations by a factor of four. Syntenic comparison with the human genome revealed an additional ∼3,000 loci that are characterized as protein coding in human and were also expressed in the dog, suggesting that those were previously not annotated in the EnsEMBL canine gene set. In addition to ∼20,700 high-confidence protein coding loci, we found ∼4,600 antisense transcripts overlapping exons of protein coding genes, ∼7,200 intergenic multi-exon transcripts without coding potential, likely candidates for long intergenic non-coding RNAs (lincRNAs) and ∼11,000 transcripts were reported by two different library construction methods but did not fit any of the above categories. Of the lincRNAs, about 6,000 have no annotated orthologs in human or mouse. Functional analysis of two novel transcripts with shRNA in a mouse kidney cell line altered cell morphology and motility. All in all, we provide a much-improved annotation of the canine genome and suggest regulatory functions for several of the novel non-coding transcripts.
UR - http://www.scopus.com/inward/record.url?scp=84898718409&partnerID=8YFLogxK
U2 - 10.1371/journal.pone.0091172
DO - 10.1371/journal.pone.0091172
M3 - Article
C2 - 24625832
AN - SCOPUS:84898718409
SN - 1932-6203
VL - 9
JO - PLoS ONE
JF - PLoS ONE
IS - 3
M1 - e91172
ER -