Inferring the semantic relationships of words within an ontology using random indexing: applications to pharmacogenomics.

Bethany Percha, Russ B. Altman

Research output: Contribution to journalArticlepeer-review

5 Scopus citations

Abstract

The biomedical literature presents a uniquely challenging text mining problem. Sentences are long and complex, the subject matter is highly specialized with a distinct vocabulary, and producing annotated training data for this domain is time consuming and expensive. In this environment, unsupervised text mining methods that do not rely on annotated training data are valuable. Here we investigate the use of random indexing, an automated method for producing vector-space semantic representations of words from large, unlabeled corpora, to address the problem of term normalization in sentences describing drugs and genes. We show that random indexing produces similarity scores that capture some of the structure of PHARE, a manually curated ontology of pharmacogenomics concepts. We further show that random indexing can be used to identify likely word candidates for inclusion in the ontology, and can help localize these new labels among classes and roles within the ontology.

Original languageEnglish
Pages (from-to)1123-1132
Number of pages10
JournalAMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium
Volume2013
StatePublished - 2013
Externally publishedYes

Fingerprint

Dive into the research topics of 'Inferring the semantic relationships of words within an ontology using random indexing: applications to pharmacogenomics.'. Together they form a unique fingerprint.

Cite this