Hierarchical-TGDR: Combining biological hierarchy with a regularization method for multi-class classification of lung cancer samples via high-throughput gene-expression data

Research output: Contribution to journalArticlepeer-review

7 Scopus citations

Abstract

Regularization methods that simultaneously select a small set of the most relevant features and build a classifier usingthe selected features have gained much attention recently in problems of classification of "omics" data. In many multiclassclassification problems, which are of practical importance, the classes are naturally endowed with a hierarchicalstructure. However, such natural hierarchical structure is often ignored. Here, we use an existing regularization algorithm, Threshold Gradient Descent Regularization, in a hierarchical fashion, which takes advantage of natural biological structureto specifically tackle multi-class classification of microarray data. We apply this approach to one of the tasks presentedby the sbv IMPROVER Diagnostic Signature Challenge: the Lung Cancer Sub-Challenge. Gene expression data from nonsmallcell lung carcinoma were used to classify tumors into adenocarcinoma and squamous cell carcinoma subtypes, and their clinical stages (I and II). Genetic and transcriptomic differences between AC and SCC have been reported, indicating a potentially different pathological mechanism of differentiation and invasion. The results from this analysisshow that hierarchical-TGDR outperforms pairwise TGDRs in terms of predictive performance, and is substantially moreparsimonious. In conclusion, the hierarchical-TGDR approach trains classifiers in a top-down fashion by considering thenaturally existing structure within the data, reducing the number of pairwise-TGDRs to be trained. It also highlightsdifferent mechanisms of "invasion" in the two subtypes. This work suggests that incorporating known biologicalinformation into classification algorithms, such as data hierarchies, can improve the discriminative performance andbiological interpretation of this classifier.

Original languageEnglish
Pages (from-to)278-287
Number of pages10
JournalSystems Biomedicine
Volume1
Issue number4
DOIs
StatePublished - 2014
Externally publishedYes

Keywords

  • Adenocarcinoma
  • Hierarchical structure
  • Lung cancer
  • Multiclass classification
  • Squamous cell carcinoma
  • Threshold gradient descent regularization

Fingerprint

Dive into the research topics of 'Hierarchical-TGDR: Combining biological hierarchy with a regularization method for multi-class classification of lung cancer samples via high-throughput gene-expression data'. Together they form a unique fingerprint.

Cite this