Abstract
Sequence motif finding is a very important and long-studied problem in computational molecular biology. While various motif representations and discovery methods exist, a recent development of graph-based algorithms has allowed practical concerns, such as positional correlations within motifs, to be taken into account. This survey provides an overview of the multi-partite graph formulation of motif finding, and focuses on algorithmic aspects of various motif discovery methodologies. Motif finding has been recast as a number of different graph substructure identification problems. First we review a formulation as a maximum-weight clique finding problem, and examine two different integer linear programs to model it. The motif finding algorithms use graph pruning techniques and a cutting planes approach in conjunction with linear programming relaxations. Secondly, we discuss a formulation of motif discovery as that of maximum density subgraph finding, and review a maximum flow based algorithm in an appropriately augmented flow network. Finally, we mention the ‘subtle’ motifs formulation, and define its corresponding graph problem of maximal clique identification. We discuss two different approaches to tackle this problem, one based on winnowing spurious edges and the other on divide-and-conquer sub-clique finding.
Original language | English |
---|---|
Title of host publication | Clustering Challenges in Biological Networks |
Publisher | World Scientific Publishing Co. |
Pages | 83-99 |
Number of pages | 17 |
ISBN (Electronic) | 9789812771667 |
ISBN (Print) | 9812771654, 9789812771650 |
DOIs | |
State | Published - 1 Jan 2009 |
Externally published | Yes |