Abstract
Let a seed, S, be a string from the alphabet {1,*}, of arbitrary length k, which starts and ends with a 1. For example, S∈= 11*1. S occurs in a binary string T at position h if the length k substring of T ending at position h contains a 1 in every position where there is a 1 in S. We say that the 1s at the corresponding positions in T are covered. We are interested in calculating the probability distribution for the number of 1s covered by a seed S in an iid Bernoulli string of length n with probability of 1 equal to p. We refer to this new probability distribution as C nSp , for covered, with S being the seed. We present an efficient method to calculate this distribution exactly. Covered 1s represent matching positions detected in DNA sequences when using multiple hits of a spaced seed. Knowledge of the distribution provides a statistical threshold for distinguishing true homologies from randomly matching sequences.
| Original language | English |
|---|---|
| Pages (from-to) | 282-293 |
| Number of pages | 12 |
| Journal | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
| Volume | 5280 LNCS |
| DOIs | |
| State | Published - 2008 |
| Externally published | Yes |
| Event | 15th International Symposium on String Processing and Information Retrieval, SPIRE 2008 - Melbourne. VIC, Australia Duration: 10 Nov 2008 → 12 Nov 2008 |
Fingerprint
Dive into the research topics of 'Exact distribution of a spaced seed statistic for DNA homology detection'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver