TY - JOUR
T1 - Characterization of Structural variants with single molecule and hybrid sequencing approaches
AU - Ritz, Anna
AU - Bashir, Ali
AU - Sindi, Suzanne
AU - Hsu, David
AU - Hajirasouliha, Iman
AU - Raphael, Benjamin J.
N1 - Publisher Copyright:
© The Author 2014. Published by Oxford University Press. All rights reserved.
PY - 2014/12/15
Y1 - 2014/12/15
N2 - Motivation: Structural variation is common in human and cancer genomes. High-throughput DNA sequencing has enabled genome-scale surveys of structural variation. However, the short reads produced by these technologies limit the study of complex variants, particularly those involving repetitive regions. Recent 'third-generation' sequencing technologies provide single-molecule templates and longer sequencing reads, but at the cost of higher per-nucleotide error rates. Results: We present MultiBreak-SV, an algorithm to detect structural variants (SVs) from single molecule sequencing data, paired read sequencing data, or a combination of sequencing data from different platforms. We demonstrate that combining low-coverage third-generation data from Pacific Biosciences (PacBio) with high-coverage paired read data is advantageous on simulated chromosomes. We apply MultiBreak-SV to PacBio data from four human fosmids and show that it detects known SVs with high sensitivity and specificity. Finally, we perform a whole-genome analysis on PacBio data from a complete hydatidiform mole cell line and predict 1002 high-probability SVs, over half of which are confirmed by an Illumina-based assembly.
AB - Motivation: Structural variation is common in human and cancer genomes. High-throughput DNA sequencing has enabled genome-scale surveys of structural variation. However, the short reads produced by these technologies limit the study of complex variants, particularly those involving repetitive regions. Recent 'third-generation' sequencing technologies provide single-molecule templates and longer sequencing reads, but at the cost of higher per-nucleotide error rates. Results: We present MultiBreak-SV, an algorithm to detect structural variants (SVs) from single molecule sequencing data, paired read sequencing data, or a combination of sequencing data from different platforms. We demonstrate that combining low-coverage third-generation data from Pacific Biosciences (PacBio) with high-coverage paired read data is advantageous on simulated chromosomes. We apply MultiBreak-SV to PacBio data from four human fosmids and show that it detects known SVs with high sensitivity and specificity. Finally, we perform a whole-genome analysis on PacBio data from a complete hydatidiform mole cell line and predict 1002 high-probability SVs, over half of which are confirmed by an Illumina-based assembly.
UR - http://www.scopus.com/inward/record.url?scp=84975741816&partnerID=8YFLogxK
U2 - 10.1093/bioinformatics/btu714
DO - 10.1093/bioinformatics/btu714
M3 - Article
C2 - 25355789
AN - SCOPUS:84975741816
SN - 1367-4803
VL - 30
SP - 3458
EP - 3466
JO - Bioinformatics
JF - Bioinformatics
IS - 24
ER -