TY - JOUR
T1 - ClipSV
T2 - Improving structural variation detection by read extension, spliced alignment and tree-based decision rules
AU - Xu, Peng
AU - Chen, Yu
AU - Gao, Min
AU - Chong, Zechen
N1 - Publisher Copyright:
© 2021 The Author(s).
PY - 2021/3/1
Y1 - 2021/3/1
N2 - Structural variation (SV), which consists of genomic variation from 50 to millions of base pairs, confers considerable impacts on human diseases, complex traits and evolution. Accurately detecting SV is a fundamental step to characterize the features of individual genomes. Currently, several methods have been proposed to detect SVs using the next-generation sequencing (NGS) platform. However, due to the short length of sequencing reads and the complexity of SV content, the SV-detecting tools are still limited by low sensitivity, especially for insertion detection. In this study, we developed a novel tool, ClipSV, to improve SV discovery. ClipSV utilizes a read extension and spliced alignment approach to overcoming the limitation of read length. By reconstructing long sequences from SV-associated short reads, ClipSV discovers deletions and short insertions from the long sequence alignments. To comprehensively characterize insertions, ClipSV implements tree-based decision rules that can efficiently utilize SV-containing reads. Based on the evaluations of both simulated and real sequencing data, ClipSV exhibited an overall better performance compared to currently popular tools, especially for insertion detection. As NGS platform represents the mainstream sequencing capacity for routine genomic applications, we anticipate ClipSV will serve as an important tool for SV characterization in future genomic studies.
AB - Structural variation (SV), which consists of genomic variation from 50 to millions of base pairs, confers considerable impacts on human diseases, complex traits and evolution. Accurately detecting SV is a fundamental step to characterize the features of individual genomes. Currently, several methods have been proposed to detect SVs using the next-generation sequencing (NGS) platform. However, due to the short length of sequencing reads and the complexity of SV content, the SV-detecting tools are still limited by low sensitivity, especially for insertion detection. In this study, we developed a novel tool, ClipSV, to improve SV discovery. ClipSV utilizes a read extension and spliced alignment approach to overcoming the limitation of read length. By reconstructing long sequences from SV-associated short reads, ClipSV discovers deletions and short insertions from the long sequence alignments. To comprehensively characterize insertions, ClipSV implements tree-based decision rules that can efficiently utilize SV-containing reads. Based on the evaluations of both simulated and real sequencing data, ClipSV exhibited an overall better performance compared to currently popular tools, especially for insertion detection. As NGS platform represents the mainstream sequencing capacity for routine genomic applications, we anticipate ClipSV will serve as an important tool for SV characterization in future genomic studies.
UR - https://www.scopus.com/pages/publications/85123224807
U2 - 10.1093/nargab/lqab003
DO - 10.1093/nargab/lqab003
M3 - Article
AN - SCOPUS:85123224807
SN - 2631-9268
VL - 3
JO - NAR Genomics and Bioinformatics
JF - NAR Genomics and Bioinformatics
IS - 1
ER -