TY - GEN
T1 - Gene cluster profile vectors
T2 - 2010 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2010
AU - Pejaver, Vikas Rao
AU - Kim, Sun
PY - 2010
Y1 - 2010
N2 - Proximity-based methods and co-evolution-based phylogenetic profiles methods have been successfully used for the identification of functionally related genes. Proximity-based methods are effective for physically clustered genes while the phylogenetic profiles method is effective for co-occurring gene sets. However, both methods predict many false positives and false negatives. In this paper, we propose the Gene Cluster Profile Vector (GCPV) method, which combines these two methods by using phylogenetic profiles of whole gene clusters. Moreover, the GCPV method is, currently, the only method that allows for the characterization of relationships between gene clusters themselves. The GCPV method groups together reasonably related operons in E. coli about 60% of the time. The method is minimally dependent on the reference genome set used and it outperforms the conventional phylogenetic profiles method. Finally, we show that the method works well for predicted gene clusters from C. crescentus and can serve as an important tool not only for understanding gene function, but also for elucidating mechanisms of general biological processes.
AB - Proximity-based methods and co-evolution-based phylogenetic profiles methods have been successfully used for the identification of functionally related genes. Proximity-based methods are effective for physically clustered genes while the phylogenetic profiles method is effective for co-occurring gene sets. However, both methods predict many false positives and false negatives. In this paper, we propose the Gene Cluster Profile Vector (GCPV) method, which combines these two methods by using phylogenetic profiles of whole gene clusters. Moreover, the GCPV method is, currently, the only method that allows for the characterization of relationships between gene clusters themselves. The GCPV method groups together reasonably related operons in E. coli about 60% of the time. The method is minimally dependent on the reference genome set used and it outperforms the conventional phylogenetic profiles method. Finally, we show that the method works well for predicted gene clusters from C. crescentus and can serve as an important tool not only for understanding gene function, but also for elucidating mechanisms of general biological processes.
KW - Cosine similarity
KW - Functional relationships
KW - Gene clusters
KW - Genome annotation
KW - Phylogenetic profiles
UR - http://www.scopus.com/inward/record.url?scp=79952405735&partnerID=8YFLogxK
U2 - 10.1109/BIBM.2010.5706530
DO - 10.1109/BIBM.2010.5706530
M3 - Conference contribution
AN - SCOPUS:79952405735
SN - 9781424483075
T3 - Proceedings - 2010 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2010
SP - 29
EP - 34
BT - Proceedings - 2010 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2010
Y2 - 18 December 2010 through 21 December 2010
ER -