Parallel lasso screening for big data optimization

Qingyang Li, Shuang Qiu, Shuiwang Ji, Paul M. Thompson, Jieping Ye, Jie Wang

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

5 Scopus citations

Abstract

Lasso regression is a widely used technique in data mining for model selection and feature extraction. In many applications, it remains challenging to apply the regression model to large-scale problems that have massive data samples with high-dimensional features. One popular and promising strategy is to solve the Lasso problem in parallel. Parallel solvers run multiple cores in parallel on a shared memory system to speedup the computation, while the practical usage is limited by the huge dimension in the feature space. Screening is a promising method to solve the problem of high dimensionality by discarding the inactive features and removing them from optimization. However, when integrating screening methods with parallel solvers, most of solvers cannot guarantee the convergence on the reduced feature matrix. In this paper, we propose a novel parallel framework by parallelizing screening methods and integrating it with our proposed parallel solver. We propose two parallel screening algorithms: Parallel Strong Rule (PSR) and Parallel Dual Polytope Projection (PDPP). For the parallel solver, we proposed an Asynchronous Grouped Coordinate Descent method (AGCD) to optimize the regression problem in parallel on the reduced feature matrix. AGCD is based on a grouped selection strategy to select the coordinate that has the maximum descent for the objective function in a group of candidates. Empirical studies on the real-world datasets demonstrate that the proposed parallel framework has a superior performance compared to the state-of-the-art parallel solvers.

Original languageEnglish
Title of host publicationKDD 2016 - Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
PublisherAssociation for Computing Machinery
Pages1705-1714
Number of pages10
ISBN (Electronic)9781450342322
DOIs
StatePublished - 13 Aug 2016
Externally publishedYes
Event22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2016 - San Francisco, United States
Duration: 13 Aug 201617 Aug 2016

Publication series

NameProceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
Volume13-17-August-2016

Conference

Conference22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2016
Country/TerritoryUnited States
CitySan Francisco
Period13/08/1617/08/16

Keywords

  • Aynchronized coordinate descent
  • Coordinate descent
  • Lasso regression
  • Parallel computing
  • Screening rules

Fingerprint

Dive into the research topics of 'Parallel lasso screening for big data optimization'. Together they form a unique fingerprint.

Cite this