Computational performance of heterogeneous ensemble frameworks on high-performance computing platforms

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

To enable efficient computations on rapidly growing big data, a variety of high-performance computing (HPC) platforms, such as traditional multi-processor systems, Hadoop and cloud computing systems, have been developed. On the analytics side of big data, several innovative machine learning methods have been developed to enable the extraction of accurate and actionable knowledge from large datasets. In particular, heterogeneous ensemble algorithms, which are designed to aggregate an unrestricted variety and number of analytical models, have performed well for a variety of prediction problems. However, the performance of these algorithms in terms of computational metrics, such as time requirement, disk space consumption and memory usage, on these HPC platforms has not been systematically examined yet. Here, we address this gap in knowledge by implementing these algorithms and systematically assessing their computational performance on traditional HPC and Hadoop platforms. Our results show that these implementations used the resources, especially disk space and memory, consistent with the respective designs of the platforms. Furthermore, due to the iterative nature of the heterogeneous ensemble computations, the traditional HPC system executed them faster than Hadoop, since an in-memory design is better suited for them than a disk-based one. Overall, our study sheds new light on the computational performance of ensemble algorithms and software frameworks on two prominent HPC platforms, and offers a systematic methodology for conducting similar assessments for other data analytics methods as well. Basic source code of our heterogeneous ensemble implementations, as well as the HPC performance assessments, are available at https://github.com/GauravPandeyLab/HPC-Ensemble.

Original languageEnglish
Title of host publicationProceedings - 2020 IEEE International Conference on Big Data, Big Data 2020
EditorsXintao Wu, Chris Jermaine, Li Xiong, Xiaohua Tony Hu, Olivera Kotevska, Siyuan Lu, Weijia Xu, Srinivas Aluru, Chengxiang Zhai, Eyhab Al-Masri, Zhiyuan Chen, Jeff Saltz
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages2843-2850
Number of pages8
ISBN (Electronic)9781728162515
DOIs
StatePublished - 10 Dec 2020
Event8th IEEE International Conference on Big Data, Big Data 2020 - Virtual, Atlanta, United States
Duration: 10 Dec 202013 Dec 2020

Publication series

NameProceedings - 2020 IEEE International Conference on Big Data, Big Data 2020

Conference

Conference8th IEEE International Conference on Big Data, Big Data 2020
Country/TerritoryUnited States
CityVirtual, Atlanta
Period10/12/2013/12/20

Keywords

  • Ensembles
  • Hadoop
  • computational performance
  • high-performance computing
  • predictive modeling

Fingerprint

Dive into the research topics of 'Computational performance of heterogeneous ensemble frameworks on high-performance computing platforms'. Together they form a unique fingerprint.

Cite this