III: Small: Bit-Parallel Algorithms for Sequence Alignment and Applications in Detecting Human Genetic Variation and Bacterial Strain Typing

  • Benson, Gary (PI)

Project Details


This project underscores the close link between the ability to do high throughput sequencing and the wide range of applications that are critical to understanding the biological world. Data analysis is the bottleneck in sequences. This project will lead to new efficiencies in sequence comparison and will lead to a new compendium of genetic variation. This work will provide support for both graduate students and undergraduate students, the latter especially through a new site the PI has established to recruit students from underrepresented groups. The BitPal algorithms will be widely distributed and facility sequence alignment in a variety of areas. The work also directly enhances the infrastructure for research in genomics, while the new protocol will facilitate the epidemiological study of pathogenic infections and analysis of biothreat agents.

The deluge of genomic sequence data has outpaced the growing speed of modern computers. For sequence analysis programs, such as VNTRseek, to keep pace with data growth, extremely high efficiency algorithms are required. This project will develop a novel class of bit-parallel alignment algorithms which exploit the inherent parallelism in computer logic operations and will yield dramatic acceleration of sequence comparison tasks. The project will achieve the following: A variety of multi-purpose, high-efficiency bit-parallel alignment algorithms will be designed and implemented for CPU and GPU (graphical processing unit) architectures. A website will be created for easy download and use of the computer code implementing the algorithms. The bit-parallel techniques to be developed are novel and represent an extension of a successful method to a larger class of problems. The algorithms will be widely applicable in bioinformatics and other fields that involve text comparison. Implementation in GPUs will expand the domain of this underutilized computing architecture. The bit-parallel algorithms will be used to replace less efficient algorithms in VNTRseek and TRF, in order to accelerate the analysis of large data sets. Initial analysis will identify VNTRs in 380 human genomes (moderate to high coverage) from the 1000 Genomes Project. Additional genomes will be analyzed as they become available. An online database will be created to store the VNTR data and provide online tools for VNTR analysis. Genome wide occurrence of VNTRs is currently unknown. The VNTRseek analysis of human genomes will result in an entirely new compendium of human genetic variation. A bacterial strain typing protocol will be developed for pathogenic bacteria, based on multilocus VNTR analysis (MLVA), using high-throughput sequencing data as an alternative to slower techniques. This will facilitate the epidemiological study of pathogenic bacterial infrections and the analysis of biothreat agents for national security.

Effective start/end date1/10/1430/09/19


  • National Science Foundation: $560,752.00


Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.