TY - GEN
T1 - Parallel lasso screening for big data optimization
AU - Li, Qingyang
AU - Qiu, Shuang
AU - Ji, Shuiwang
AU - Thompson, Paul M.
AU - Ye, Jieping
AU - Wang, Jie
N1 - Publisher Copyright:
© 2016 ACM.
PY - 2016/8/13
Y1 - 2016/8/13
N2 - Lasso regression is a widely used technique in data mining for model selection and feature extraction. In many applications, it remains challenging to apply the regression model to large-scale problems that have massive data samples with high-dimensional features. One popular and promising strategy is to solve the Lasso problem in parallel. Parallel solvers run multiple cores in parallel on a shared memory system to speedup the computation, while the practical usage is limited by the huge dimension in the feature space. Screening is a promising method to solve the problem of high dimensionality by discarding the inactive features and removing them from optimization. However, when integrating screening methods with parallel solvers, most of solvers cannot guarantee the convergence on the reduced feature matrix. In this paper, we propose a novel parallel framework by parallelizing screening methods and integrating it with our proposed parallel solver. We propose two parallel screening algorithms: Parallel Strong Rule (PSR) and Parallel Dual Polytope Projection (PDPP). For the parallel solver, we proposed an Asynchronous Grouped Coordinate Descent method (AGCD) to optimize the regression problem in parallel on the reduced feature matrix. AGCD is based on a grouped selection strategy to select the coordinate that has the maximum descent for the objective function in a group of candidates. Empirical studies on the real-world datasets demonstrate that the proposed parallel framework has a superior performance compared to the state-of-the-art parallel solvers.
AB - Lasso regression is a widely used technique in data mining for model selection and feature extraction. In many applications, it remains challenging to apply the regression model to large-scale problems that have massive data samples with high-dimensional features. One popular and promising strategy is to solve the Lasso problem in parallel. Parallel solvers run multiple cores in parallel on a shared memory system to speedup the computation, while the practical usage is limited by the huge dimension in the feature space. Screening is a promising method to solve the problem of high dimensionality by discarding the inactive features and removing them from optimization. However, when integrating screening methods with parallel solvers, most of solvers cannot guarantee the convergence on the reduced feature matrix. In this paper, we propose a novel parallel framework by parallelizing screening methods and integrating it with our proposed parallel solver. We propose two parallel screening algorithms: Parallel Strong Rule (PSR) and Parallel Dual Polytope Projection (PDPP). For the parallel solver, we proposed an Asynchronous Grouped Coordinate Descent method (AGCD) to optimize the regression problem in parallel on the reduced feature matrix. AGCD is based on a grouped selection strategy to select the coordinate that has the maximum descent for the objective function in a group of candidates. Empirical studies on the real-world datasets demonstrate that the proposed parallel framework has a superior performance compared to the state-of-the-art parallel solvers.
KW - Aynchronized coordinate descent
KW - Coordinate descent
KW - Lasso regression
KW - Parallel computing
KW - Screening rules
UR - http://www.scopus.com/inward/record.url?scp=84985041112&partnerID=8YFLogxK
U2 - 10.1145/2939672.2939859
DO - 10.1145/2939672.2939859
M3 - Conference contribution
AN - SCOPUS:84985041112
T3 - Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
SP - 1705
EP - 1714
BT - KDD 2016 - Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
PB - Association for Computing Machinery
T2 - 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2016
Y2 - 13 August 2016 through 17 August 2016
ER -