TY - JOUR
T1 - Small open reading frames
T2 - a comparative genetics approach to validation
AU - Jain, Niyati
AU - Richter, Felix
AU - Adzhubei, Ivan
AU - Sharp, Andrew J.
AU - Gelb, Bruce D.
N1 - Publisher Copyright:
© 2023, The Author(s).
PY - 2023/12
Y1 - 2023/12
N2 - Open reading frames (ORFs) with fewer than 100 codons are generally not annotated in genomes, although bona fide genes of that size are known. Newer biochemical studies have suggested that thousands of small protein-coding ORFs (smORFs) may exist in the human genome, but the true number and the biological significance of the micropeptides they encode remain uncertain. Here, we used a comparative genomics approach to identify high-confidence smORFs that are likely protein-coding. We identified 3,326 high-confidence smORFs using constraint within human populations and evolutionary conservation as additional lines of evidence. Next, we validated that, as a group, our high-confidence smORFs are conserved at the amino-acid level rather than merely residing in highly conserved non-coding regions. Finally, we found that high-confidence smORFs are enriched among disease-associated variants from GWAS. Overall, our results highlight that smORF-encoded peptides likely have important functional roles in human disease.
AB - Open reading frames (ORFs) with fewer than 100 codons are generally not annotated in genomes, although bona fide genes of that size are known. Newer biochemical studies have suggested that thousands of small protein-coding ORFs (smORFs) may exist in the human genome, but the true number and the biological significance of the micropeptides they encode remain uncertain. Here, we used a comparative genomics approach to identify high-confidence smORFs that are likely protein-coding. We identified 3,326 high-confidence smORFs using constraint within human populations and evolutionary conservation as additional lines of evidence. Next, we validated that, as a group, our high-confidence smORFs are conserved at the amino-acid level rather than merely residing in highly conserved non-coding regions. Finally, we found that high-confidence smORFs are enriched among disease-associated variants from GWAS. Overall, our results highlight that smORF-encoded peptides likely have important functional roles in human disease.
KW - Comparative genetics
KW - Evolutionary conservation
KW - Human genetic variation
KW - Micropeptides
KW - Small open reading frames
UR - http://www.scopus.com/inward/record.url?scp=85157963694&partnerID=8YFLogxK
U2 - 10.1186/s12864-023-09311-7
DO - 10.1186/s12864-023-09311-7
M3 - Article
C2 - 37127568
AN - SCOPUS:85157963694
SN - 1471-2164
VL - 24
JO - BMC Genomics
JF - BMC Genomics
IS - 1
M1 - 226
ER -