TY - JOUR
T1 - Strobe sequence design for haplotype assembly
AU - Lo, Christine
AU - Bashir, Ali
AU - Bansal, Vikas
AU - Bafna, Vineet
N1 - Funding Information:
CL and VB acknowledge partial grant support from the NSF (IIS0810905), NIH (RO1-HG004962), a U54 center grant (5U54HL108460), and a NSF Graduate Research Fellowship for CL. This article has been published as part of BMC Bioinformatics Volume 12 Supplement 1, 2011: Selected articles from the Ninth Asia Pacific Bioinformatics Conference (APBC 2011). The full contents of the supplement are available online at http://www.biomedcentral.com/1471-2105/12? issue=S1.
PY - 2011/2/15
Y1 - 2011/2/15
N2 - Background: Humans are diploid, carrying two copies of each chromosome, one from each parent. Separating the paternal and maternal chromosomes is an important component of genetic analyses such as determining genetic association, inferring evolutionary scenarios, computing recombination rates, and detecting cis-regulatory events. As the pair of chromosomes are mostly identical to each other, linking together of alleles at heterozygous sites is sufficient to phase, or separate the two chromosomes. In Haplotype Assembly, the linking is done by sequenced fragments that overlap two heterozygous sites. While there has been a lot of research on correcting errors to achieve accurate haplotypes via assembly, relatively little work has been done on designing sequencing experiments to get long haplotypes. Here, we describe the different design parameters that can be adjusted with next generation and upcoming sequencing technologies, and study the impact of design choice on the length of the haplotype.Results: We show that a number of parameters influence haplotype length, with the most significant one being the advance length (distance between two fragments of a clone). Given technologies like strobe sequencing that allow for large variations in advance lengths, we design and implement a simulated annealing algorithm to sample a large space of distributions over advance-lengths. Extensive simulations on individual genomic sequences suggest that a non-trivial distribution over advance lengths results a 1-2 order of magnitude improvement in median haplotype length.Conclusions: Our results suggest that haplotyping of large, biologically important genomic regions is feasible with current technologies.
AB - Background: Humans are diploid, carrying two copies of each chromosome, one from each parent. Separating the paternal and maternal chromosomes is an important component of genetic analyses such as determining genetic association, inferring evolutionary scenarios, computing recombination rates, and detecting cis-regulatory events. As the pair of chromosomes are mostly identical to each other, linking together of alleles at heterozygous sites is sufficient to phase, or separate the two chromosomes. In Haplotype Assembly, the linking is done by sequenced fragments that overlap two heterozygous sites. While there has been a lot of research on correcting errors to achieve accurate haplotypes via assembly, relatively little work has been done on designing sequencing experiments to get long haplotypes. Here, we describe the different design parameters that can be adjusted with next generation and upcoming sequencing technologies, and study the impact of design choice on the length of the haplotype.Results: We show that a number of parameters influence haplotype length, with the most significant one being the advance length (distance between two fragments of a clone). Given technologies like strobe sequencing that allow for large variations in advance lengths, we design and implement a simulated annealing algorithm to sample a large space of distributions over advance-lengths. Extensive simulations on individual genomic sequences suggest that a non-trivial distribution over advance lengths results a 1-2 order of magnitude improvement in median haplotype length.Conclusions: Our results suggest that haplotyping of large, biologically important genomic regions is feasible with current technologies.
UR - http://www.scopus.com/inward/record.url?scp=79951534673&partnerID=8YFLogxK
U2 - 10.1186/1471-2105-12-S1-S24
DO - 10.1186/1471-2105-12-S1-S24
M3 - Article
C2 - 21342554
AN - SCOPUS:79951534673
SN - 1471-2105
VL - 12
JO - BMC Bioinformatics
JF - BMC Bioinformatics
IS - SUPPL. 1
M1 - S24
ER -