Abstract
We describe a new distance measure for comparing DNA sequence profiles. For this measure, columns in a multiple alignment are treated as character frequency vectors (sum of the frequencies equal to one). The distance between two vectors is based on minimum path length along an entropy surface. Path length is estimated using a random graph generated on the entropy surface and Dijkstra's algorithm for all shortest paths to a source. We use the new distance measure to analyze similarities within familes of tandem repeats in the C. elegans genome and show that this new measure gives more accurate refinement of family relationships than a method based on comparing consensus sequences.
Original language | English |
---|---|
Pages (from-to) | S44-S53 |
Journal | Bioinformatics |
Volume | 18 |
Issue number | SUPPL. 2 |
DOIs | |
State | Published - 2002 |