Review of programming models for data-intensive computing

Peng Wang, Dan Meng, Jianfeng Zhan, Bibo Tu

Research output: Contribution to journalReview articlepeer-review

Abstract

Advances in communication, computation, and storage have created large amounts of data. The ability to collect, organize, and analyze massive amounts of data could lead to breakthroughs in business, science, and society. As a new computing paradigm, cloud computing focuses on Internet service, and Internet service providers have an increasing need to store and analyze massive data sets. In order to perform Web-scale analysis in a cost-effective manner, recently several Internet companies have developed distributed programming systems on large-scale clusters composed of shared-nothing commodity servers, which we call cloud platform. It is a great challenge to design a programming model and system that enables developers to easily write reliable programs that can efficiently utilize cluster-wide resources and achieve maximum degree of parallelism on the cloud platform. Many challenging and exciting research problems arise when trying to scale up the systems and computations to handle terabyte-scale datasets. The recent advance in programming model for massive data processing is reviewed in this context. Firstly, the unique characteristics of data-intensive computing are presented. The fundamental issues of programming model for massive data processing are pointed out. Secondly, several state-of-the-art programming systems for data-intensive computing are described in detail. Thirdly, the pros and cons of the classic programming models are compared and discussed. Finally, the open issues and future work in this field are explored.

Original languageEnglish
Pages (from-to)1993-2002
Number of pages10
JournalJisuanji Yanjiu yu Fazhan/Computer Research and Development
Volume47
Issue number11
StatePublished - Nov 2010
Externally publishedYes

Keywords

  • Cloud computing
  • Data intensive computing
  • Data-parallel
  • Large-scale data processing
  • MapReduce
  • Programming model

Fingerprint

Dive into the research topics of 'Review of programming models for data-intensive computing'. Together they form a unique fingerprint.

Cite this