Information theoretic approaches to whole genome phylogenies

David Burstein, Igor Ulitsky, Tamir Tuller, Benny Chor

Research output: Contribution to journalConference articlepeer-review

9 Scopus citations

Abstract

We describe a novel method for efficient reconstruction of phylogenetic trees, based on sequences of whole genomes or proteomes. The core of our method is a new measure of pairwise distances between sequences, whose lengths may greatly vary. This measure is based on information theoretic tools (Kullback-Leibler relative entropy). We present an algorithm for efficiently computing these distances. The algorithm uses suffix arrays to compute the distance of two l long sequences in O(l) time. It is fast enough to enable the construction of the phylogenomic tree for hundreds of species, and the phylogenomic forest for almost two thousand viruses. An initial analysis of the results exhibits a remarkable agreement with "acceptable phylogenetic truth". To assess our approach, it was implemented together with a number of alternative approaches, including two that were previously published in the literature. Comparing their outcome to ours, using a "traditional" tree and a standard tree comparison method, our algorithm improved upon the "competition" by a substantial margin.

Original languageEnglish
Pages (from-to)283-295
Number of pages13
JournalLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume3500
DOIs
StatePublished - 2005
Externally publishedYes
Event9th Annual International Conference on Research in Computational Molecular Biology, RECOMB 2005 - Cambridge, MA, United States
Duration: 14 May 200518 May 2005

Keywords

  • Distance matrix
  • Divergence
  • Kullback-leibler relative entropy
  • Phylogenomics
  • Tree reconstruction
  • Whole genome and proteom phylogenetic

Fingerprint

Dive into the research topics of 'Information theoretic approaches to whole genome phylogenies'. Together they form a unique fingerprint.

Cite this