TY - JOUR
T1 - Detecting epigenetic motifs in low coverage and metagenomics settings
AU - Beckmann, Noam D.
AU - Karri, Sashank
AU - Fang, Gang
AU - Bashir, Ali
N1 - Funding Information:
Publication costs for this manuscript were funded in part by RECOMB-Seq and in part by the Icahn School of Medicine at Mount Sinai through seed funding to A.B. This article has been published as part of BMC Bioinformatics Volume 15 Supplement 9, 2014: Proceedings of the Fourth Annual RECOMB Satellite Workshop on Massively Parallel Sequencing (RECOMB-Seq 2014). The full contents of the supplement are available online at http://www. biomedcentral.com/bmcbioinformatics/supplements/15/S9.
Funding Information:
We thank Jonas Korlach and Tyson Clark at Pacific Biosciences for providing access to raw sequencing data as well as secondary analysis of samples from Murray et al. and John Beaulaurier from the Fang Lab for helpful discussions and code-sharing. We would also like to acknowledge Dr. Eric Schadt and Dr. Jun Zhu for their support of N.D.B and S.K, respectively. This work was also supported in part through the computational resources and staff expertise provided by the Department of Scientific Computing at the Icahn School of Medicine at Mount Sinai.
Publisher Copyright:
© 2014 Beckmann et al.
PY - 2014
Y1 - 2014
N2 - Background: It has recently become possible to rapidly and accurately detect epigenetic signatures in bacterial genomes using third generation sequencing data. Monitoring the speed at which a single polymerase inserts a base in the read strand enables one to infer whether a modification is present at that specific site on the template strand. These sites can be challenging to detect in the absence of high coverage and reliable reference genomes. Methods: Here we provide a new method for detecting epigenetic motifs in bacteria on datasets with lowcoverage, with incomplete references, and with mixed samples (i.e. metagenomic data). Our approach treats motif inference as a kmer comparison problem. First, genomes (or contigs) are deconstructed into kmers. Then, native genome-wide distributions of interpulse durations (IPDs) for kmers are compared with corresponding whole genome amplified (WGA, modification free) IPD distributions using log likelihood ratios. Finally, kmers are ranked and greedily selected by iteratively correcting for sequences within a particular kmer's neighborhood. Conclusions: Our method can detect multiple types of modifications, even at very low-coverage and in the presence of mixed genomes. Additionally, we are able to predict modified motifs when genomes with "neighbor" modified motifs exist within the sample. Lastly, we show that these motifs can provide an alternative source of information by which to cluster metagenomics contigs and that iterative refinement on these clustered contigs can further improve both sensitivity and specificity of motif detection.
AB - Background: It has recently become possible to rapidly and accurately detect epigenetic signatures in bacterial genomes using third generation sequencing data. Monitoring the speed at which a single polymerase inserts a base in the read strand enables one to infer whether a modification is present at that specific site on the template strand. These sites can be challenging to detect in the absence of high coverage and reliable reference genomes. Methods: Here we provide a new method for detecting epigenetic motifs in bacteria on datasets with lowcoverage, with incomplete references, and with mixed samples (i.e. metagenomic data). Our approach treats motif inference as a kmer comparison problem. First, genomes (or contigs) are deconstructed into kmers. Then, native genome-wide distributions of interpulse durations (IPDs) for kmers are compared with corresponding whole genome amplified (WGA, modification free) IPD distributions using log likelihood ratios. Finally, kmers are ranked and greedily selected by iteratively correcting for sequences within a particular kmer's neighborhood. Conclusions: Our method can detect multiple types of modifications, even at very low-coverage and in the presence of mixed genomes. Additionally, we are able to predict modified motifs when genomes with "neighbor" modified motifs exist within the sample. Lastly, we show that these motifs can provide an alternative source of information by which to cluster metagenomics contigs and that iterative refinement on these clustered contigs can further improve both sensitivity and specificity of motif detection.
UR - http://www.scopus.com/inward/record.url?scp=84921766043&partnerID=8YFLogxK
U2 - 10.1186/1471-2105-15-S9-S16
DO - 10.1186/1471-2105-15-S9-S16
M3 - Article
C2 - 25253358
AN - SCOPUS:84921766043
SN - 1471-2105
VL - 15
SP - S16
JO - BMC Bioinformatics
JF - BMC Bioinformatics
ER -