TY - JOUR
T1 - Hierarchical-TGDR
T2 - Combining biological hierarchy with a regularization method for multi-class classification of lung cancer samples via high-throughput gene-expression data
AU - Tian, Suyan
AU - Suárez-Fariñas, Mayte
N1 - Publisher Copyright:
© 2013 Landes Bioscience.
PY - 2014
Y1 - 2014
N2 - Regularization methods that simultaneously select a small set of the most relevant features and build a classifier usingthe selected features have gained much attention recently in problems of classification of "omics" data. In many multiclassclassification problems, which are of practical importance, the classes are naturally endowed with a hierarchicalstructure. However, such natural hierarchical structure is often ignored. Here, we use an existing regularization algorithm, Threshold Gradient Descent Regularization, in a hierarchical fashion, which takes advantage of natural biological structureto specifically tackle multi-class classification of microarray data. We apply this approach to one of the tasks presentedby the sbv IMPROVER Diagnostic Signature Challenge: the Lung Cancer Sub-Challenge. Gene expression data from nonsmallcell lung carcinoma were used to classify tumors into adenocarcinoma and squamous cell carcinoma subtypes, and their clinical stages (I and II). Genetic and transcriptomic differences between AC and SCC have been reported, indicating a potentially different pathological mechanism of differentiation and invasion. The results from this analysisshow that hierarchical-TGDR outperforms pairwise TGDRs in terms of predictive performance, and is substantially moreparsimonious. In conclusion, the hierarchical-TGDR approach trains classifiers in a top-down fashion by considering thenaturally existing structure within the data, reducing the number of pairwise-TGDRs to be trained. It also highlightsdifferent mechanisms of "invasion" in the two subtypes. This work suggests that incorporating known biologicalinformation into classification algorithms, such as data hierarchies, can improve the discriminative performance andbiological interpretation of this classifier.
AB - Regularization methods that simultaneously select a small set of the most relevant features and build a classifier usingthe selected features have gained much attention recently in problems of classification of "omics" data. In many multiclassclassification problems, which are of practical importance, the classes are naturally endowed with a hierarchicalstructure. However, such natural hierarchical structure is often ignored. Here, we use an existing regularization algorithm, Threshold Gradient Descent Regularization, in a hierarchical fashion, which takes advantage of natural biological structureto specifically tackle multi-class classification of microarray data. We apply this approach to one of the tasks presentedby the sbv IMPROVER Diagnostic Signature Challenge: the Lung Cancer Sub-Challenge. Gene expression data from nonsmallcell lung carcinoma were used to classify tumors into adenocarcinoma and squamous cell carcinoma subtypes, and their clinical stages (I and II). Genetic and transcriptomic differences between AC and SCC have been reported, indicating a potentially different pathological mechanism of differentiation and invasion. The results from this analysisshow that hierarchical-TGDR outperforms pairwise TGDRs in terms of predictive performance, and is substantially moreparsimonious. In conclusion, the hierarchical-TGDR approach trains classifiers in a top-down fashion by considering thenaturally existing structure within the data, reducing the number of pairwise-TGDRs to be trained. It also highlightsdifferent mechanisms of "invasion" in the two subtypes. This work suggests that incorporating known biologicalinformation into classification algorithms, such as data hierarchies, can improve the discriminative performance andbiological interpretation of this classifier.
KW - Adenocarcinoma
KW - Hierarchical structure
KW - Lung cancer
KW - Multiclass classification
KW - Squamous cell carcinoma
KW - Threshold gradient descent regularization
UR - http://www.scopus.com/inward/record.url?scp=85043224860&partnerID=8YFLogxK
U2 - 10.4161/sysb.25979
DO - 10.4161/sysb.25979
M3 - Article
AN - SCOPUS:85043224860
SN - 2162-8130
VL - 1
SP - 278
EP - 287
JO - Systems Biomedicine
JF - Systems Biomedicine
IS - 4
ER -