Abstract
Predicting catalytic sites of a given enzyme is an important open problem of Bioinformatics. Recently, many machine learning-based methods have been developed which have the advantage that they can account for many sequential or structural features. We found that although many kinds of features are incorporated, protein sequence conservation is the main part of information they used and should play an important role in the future. So we tested several conservation features in their ability to predict catalytic sites by using the Support Vector Machine classifier. Our results suggest that position specific scoring matrix performs better than other features and incorporating conservation information of sequentially adjacent sites is more effective than that of structurally adjacent ones. Moreover, although conservation information is effective in predicting catalytic sites, it is a difficult problem to optimize the combination of conservation features and other ones.
| Original language | English |
|---|---|
| Pages (from-to) | 229-239 |
| Number of pages | 11 |
| Journal | Protein Journal |
| Volume | 30 |
| Issue number | 4 |
| DOIs | |
| State | Published - Apr 2011 |
| Externally published | Yes |
Keywords
- Catalytic site prediction
- Neighboring sites
- Sequence conservation
- Support vector machine