Using PharmGKB to train text mining approaches for identifying potential gene targets for pharmacogenomic studies

S. Pakhomov, B. T. McInnes, J. Lamba, Y. Liu, G. B. Melton, Y. Ghodke, N. Bhise, V. Lamba, A. K. Birnbaum

Research output: Contribution to journalArticlepeer-review

12 Scopus citations

Abstract

The main objective of this study was to investigate the feasibility of using PharmGKB, a pharmacogenomic database, as a source of training data in combination with text of MEDLINE abstracts for a text mining approach to identification of potential gene targets for pathway-driven pharmacogenomics research. We used the manually curated relations between drugs and genes in PharmGKB database to train a support vector machine predictive model and applied this model prospectively to MEDLINE abstracts. The gene targets suggested by this approach were subsequently manually reviewed. Our quantitative analysis showed that a support vector machine classifiers trained on MEDLINE abstracts with single words (unigrams) used as features and PharmGKB relations used for supervision, achieve an overall sensitivity of 85% and specificity of 69%. The subsequent qualitative analysis showed that gene targets " suggested" by the automatic classifier were not anticipated by expert reviewers but were subsequently found to be relevant to the three drugs that were investigated: carbamazepine, lamivudine and zidovudine. Our results show that this approach is not only feasible but may also find new gene targets not identifiable by other methods thus making it a valuable tool for pathway-driven pharmacogenomics research.

Original languageEnglish
Pages (from-to)862-869
Number of pages8
JournalJournal of Biomedical Informatics
Volume45
Issue number5
DOIs
StatePublished - Oct 2012
Externally publishedYes

Keywords

  • Gene-drug associations
  • Pathway-driven analysis
  • PharmGKB
  • Pharmacogenomics
  • Support vector machine
  • Text mining

Fingerprint

Dive into the research topics of 'Using PharmGKB to train text mining approaches for identifying potential gene targets for pharmacogenomic studies'. Together they form a unique fingerprint.

Cite this