TY - GEN
T1 - Systematic comparison of machine learning methods for identification of miRNA species as disease biomarkers
AU - Higuchi, Chihiro
AU - Tanaka, Toshihiro
AU - Okada, Yukinori
N1 - Publisher Copyright:
© Springer International Publishing Switzerland 2015.
PY - 2015
Y1 - 2015
N2 - Micro RNA (miRNA) plays important roles in a variety of biological processes and can act as disease biomarkers. Thus, establishment of discovery methods to detect disease-related miRNAs is warranted. Human omics data including miRNA expression profiles have orders of magnitude with much more number of descriptors (p) than that of samples (n), which is so called “p >> n problem”. Since traditional statistical methods mislead to localized solutions, application of machine learning (ML) methods that handle sparse selection of the variables are expected to solve this problem. Among many ML methods, least absolute shrinkage and selection operator (LASSO) and multivariate adaptive regression splines (MARS) give a few variables from the result of supervised learning with endpoints such as human disease statuses. Here, we performed systematic comparison of LASSO and MARS to discover biomarkers, using six miRNA expression data sets of human disease samples, which were obtained from NCBI Gene Expression Omnibus (GEO). We additionally conducted partial least square method discriminant analysis (PLS-DA), as a control traditional method to evaluate baseline performance of discriminant methods. We observed that LASSO and MARS showed relatively higher performance compared to that of PLS-DA, as the number of the samples increases. Also, some of the identified miRNA species by ML methods have already been reported as candidate disease biomarkers in the previous biological studies. These findings should contribute to the extension of our knowledge on ML method performances in empirical utilization of clinical data.
AB - Micro RNA (miRNA) plays important roles in a variety of biological processes and can act as disease biomarkers. Thus, establishment of discovery methods to detect disease-related miRNAs is warranted. Human omics data including miRNA expression profiles have orders of magnitude with much more number of descriptors (p) than that of samples (n), which is so called “p >> n problem”. Since traditional statistical methods mislead to localized solutions, application of machine learning (ML) methods that handle sparse selection of the variables are expected to solve this problem. Among many ML methods, least absolute shrinkage and selection operator (LASSO) and multivariate adaptive regression splines (MARS) give a few variables from the result of supervised learning with endpoints such as human disease statuses. Here, we performed systematic comparison of LASSO and MARS to discover biomarkers, using six miRNA expression data sets of human disease samples, which were obtained from NCBI Gene Expression Omnibus (GEO). We additionally conducted partial least square method discriminant analysis (PLS-DA), as a control traditional method to evaluate baseline performance of discriminant methods. We observed that LASSO and MARS showed relatively higher performance compared to that of PLS-DA, as the number of the samples increases. Also, some of the identified miRNA species by ML methods have already been reported as candidate disease biomarkers in the previous biological studies. These findings should contribute to the extension of our knowledge on ML method performances in empirical utilization of clinical data.
KW - LASSO
KW - Least absolute shrinkage and selection operator
KW - MARS
KW - Machine learning
KW - MiRNA
KW - Micro RNA
KW - Multivariate adaptive regression splines
UR - http://www.scopus.com/inward/record.url?scp=84925263226&partnerID=8YFLogxK
U2 - 10.1007/978-3-319-16480-9_38
DO - 10.1007/978-3-319-16480-9_38
M3 - Conference contribution
AN - SCOPUS:84925263226
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 386
EP - 394
BT - Bioinformatics and Biomedical Engineering - 3rd International Conference, IWBBIO 2015, Proceedings
A2 - Ortuño, Francisco
A2 - Rojas, Ignacio
PB - Springer Verlag
T2 - 3rd International Work-Conference on Bioinformatics and Biomedical Engineering, IWBBIO 2015
Y2 - 15 April 2015 through 17 April 2015
ER -