TY - GEN
T1 - Computational performance of heterogeneous ensemble frameworks on high-performance computing platforms
AU - Wang, Linhua
AU - Timsina, Prem
AU - Pandey, Gaurav
N1 - Publisher Copyright:
© 2020 IEEE.
PY - 2020/12/10
Y1 - 2020/12/10
N2 - To enable efficient computations on rapidly growing big data, a variety of high-performance computing (HPC) platforms, such as traditional multi-processor systems, Hadoop and cloud computing systems, have been developed. On the analytics side of big data, several innovative machine learning methods have been developed to enable the extraction of accurate and actionable knowledge from large datasets. In particular, heterogeneous ensemble algorithms, which are designed to aggregate an unrestricted variety and number of analytical models, have performed well for a variety of prediction problems. However, the performance of these algorithms in terms of computational metrics, such as time requirement, disk space consumption and memory usage, on these HPC platforms has not been systematically examined yet. Here, we address this gap in knowledge by implementing these algorithms and systematically assessing their computational performance on traditional HPC and Hadoop platforms. Our results show that these implementations used the resources, especially disk space and memory, consistent with the respective designs of the platforms. Furthermore, due to the iterative nature of the heterogeneous ensemble computations, the traditional HPC system executed them faster than Hadoop, since an in-memory design is better suited for them than a disk-based one. Overall, our study sheds new light on the computational performance of ensemble algorithms and software frameworks on two prominent HPC platforms, and offers a systematic methodology for conducting similar assessments for other data analytics methods as well. Basic source code of our heterogeneous ensemble implementations, as well as the HPC performance assessments, are available at https://github.com/GauravPandeyLab/HPC-Ensemble.
AB - To enable efficient computations on rapidly growing big data, a variety of high-performance computing (HPC) platforms, such as traditional multi-processor systems, Hadoop and cloud computing systems, have been developed. On the analytics side of big data, several innovative machine learning methods have been developed to enable the extraction of accurate and actionable knowledge from large datasets. In particular, heterogeneous ensemble algorithms, which are designed to aggregate an unrestricted variety and number of analytical models, have performed well for a variety of prediction problems. However, the performance of these algorithms in terms of computational metrics, such as time requirement, disk space consumption and memory usage, on these HPC platforms has not been systematically examined yet. Here, we address this gap in knowledge by implementing these algorithms and systematically assessing their computational performance on traditional HPC and Hadoop platforms. Our results show that these implementations used the resources, especially disk space and memory, consistent with the respective designs of the platforms. Furthermore, due to the iterative nature of the heterogeneous ensemble computations, the traditional HPC system executed them faster than Hadoop, since an in-memory design is better suited for them than a disk-based one. Overall, our study sheds new light on the computational performance of ensemble algorithms and software frameworks on two prominent HPC platforms, and offers a systematic methodology for conducting similar assessments for other data analytics methods as well. Basic source code of our heterogeneous ensemble implementations, as well as the HPC performance assessments, are available at https://github.com/GauravPandeyLab/HPC-Ensemble.
KW - Ensembles
KW - Hadoop
KW - computational performance
KW - high-performance computing
KW - predictive modeling
UR - http://www.scopus.com/inward/record.url?scp=85103830760&partnerID=8YFLogxK
U2 - 10.1109/BigData50022.2020.9378392
DO - 10.1109/BigData50022.2020.9378392
M3 - Conference contribution
AN - SCOPUS:85103830760
T3 - Proceedings - 2020 IEEE International Conference on Big Data, Big Data 2020
SP - 2843
EP - 2850
BT - Proceedings - 2020 IEEE International Conference on Big Data, Big Data 2020
A2 - Wu, Xintao
A2 - Jermaine, Chris
A2 - Xiong, Li
A2 - Hu, Xiaohua Tony
A2 - Kotevska, Olivera
A2 - Lu, Siyuan
A2 - Xu, Weijia
A2 - Aluru, Srinivas
A2 - Zhai, Chengxiang
A2 - Al-Masri, Eyhab
A2 - Chen, Zhiyuan
A2 - Saltz, Jeff
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 8th IEEE International Conference on Big Data, Big Data 2020
Y2 - 10 December 2020 through 13 December 2020
ER -