A Fast and Scalable Pathwise-Solver for Group Lasso and Elastic Net Penalized Regression via Block-Coordinate Descent

James Yang; Trevor Hastie

A Fast and Scalable Pathwise-Solver for Group Lasso and Elastic Net Penalized Regression via Block-Coordinate Descent

James Yang, Trevor Hastie

TL;DR

This work introduces a fast, scalable block-coordinate descent solver for the group lasso and group elastic net under Gaussian loss, extended to general convex losses via proximal quasi-Newton. A key novelty is solving each block update with a Newton-based method by rotating to the eigenbasis of $X_g^T W X_g$, reducing the subproblem to a diagonal quadratic form and a one-dimensional root find for $\|x\|_2$. The algorithm leverages a pathwise strategy, screening rules, and an adaptive Newton-ABS variant to achieve quadratic convergence and strong empirical speedups (3–10×) over existing solvers, including competitive lasso performance against glmnet. Its multi-response and GLM extensions, plus a matrix abstraction that accommodates structured data (e.g., GWAS), broaden applicability to large-scale, real-world problems with high group counts and complex loss surfaces.

Abstract

We develop fast and scalable algorithms based on block-coordinate descent to solve the group lasso and the group elastic net for generalized linear models along a regularization path. Special attention is given when the loss is the usual least squares loss (Gaussian loss). We show that each block-coordinate update can be solved efficiently using Newton's method and further improved using an adaptive bisection method, solving these updates with a quadratic convergence rate. Our benchmarks show that our package adelie performs 3 to 10 times faster than the next fastest package on a wide array of both simulated and real datasets. Moreover, we demonstrate that our package is a competitive lasso solver as well, matching the performance of the popular lasso package glmnet.

A Fast and Scalable Pathwise-Solver for Group Lasso and Elastic Net Penalized Regression via Block-Coordinate Descent

TL;DR

, reducing the subproblem to a diagonal quadratic form and a one-dimensional root find for

. The algorithm leverages a pathwise strategy, screening rules, and an adaptive Newton-ABS variant to achieve quadratic convergence and strong empirical speedups (3–10×) over existing solvers, including competitive lasso performance against glmnet. Its multi-response and GLM extensions, plus a matrix abstraction that accommodates structured data (e.g., GWAS), broaden applicability to large-scale, real-world problems with high group counts and complex loss surfaces.

Abstract

Paper Structure (26 sections, 2 theorems, 53 equations, 7 figures, 1 table, 4 algorithms)

This paper contains 26 sections, 2 theorems, 53 equations, 7 figures, 1 table, 4 algorithms.

Introduction
Preliminaries and Notations
Algorithms for Group Lasso/Elastic Net under Gaussian Loss
Vanilla Newton's Method-Based Algorithm
Newton's Method with Adaptive Bisection Starts Algorithm
Naive and Covariance Updates
Matrix Abstraction
Pathwise Block-Coordinate Descent
Screen and Active Sets
Convergence Criterion
Lasso Optimization
Algorithms for Regularized Generalized Linear Models
Application on Generalized Linear Models
Screen Sets and the KKT Check
Numerical Stability Issues
...and 11 more sections

Key Result

Theorem 3.1

Consider the optimization problem of eq:algorithm:bcd-general with $\lambda > 0$. If the following condition holds then a finite minimizer exists. Moreover, the minimizer is given by

Figures (7)

Figure 1: A comparison of the coefficient profile for the group lasso, group elastic net, and the ridge on the Leukemia dataset by Golub1999. From left to right, we gradually see more features entering the model with an overall shrinkage on the coefficients towards zero until the ridge includes every feature.
Figure 2: Time and accuracy comparison between the proximal gradient methods and our Newton's method. Overall, the proximal gradient methods suffer from large computational overhead due to large number of iterations, and fails to converge as the dimension increases. Contrastingly, Newton's method is quite stable, properly converges, and runs 10 to 1000 times faster.
Figure 3: Plot of the iterates for the vanilla Newton's method (left) and Newton-ABS (right). In both plots, we display the iterates by their value of $\varphi$ and describe their path by the color gradient scheme. The iterates are guaranteed to move from left to right. The solution is marked with a star. For visualization purposes, the axes were scaled using the symmetrical log transformation, which deceptively makes $\varphi$ look non-convex. It is clear that Newton's method struggles where there is a sharp decay near the origin as the Newton iterates slowly exit the kink. Newton-ABS gets around this problem by finding a good initial point sufficiently away from the origin using our adaptive bisection strategy.
Figure 4: Timing comparisons for solving the group lasso under the Gaussian loss (\ref{['fig:benchmark:group-lasso-sim:gaussian']}) and the Binomial loss (\ref{['fig:benchmark:group-lasso-sim:binomial']}) against existing R packages. We study a small ($n=100$) and large ($n=1000$) sample size case, and for each case we vary the (equi-)correlation of the features ($\rho$).
Figure 5: Timing comparisons for solving the group lasso for the real datasets against existing R packages.
...and 2 more figures

Theorems & Definitions (2)

Theorem 3.1: Sufficient Condition for the Existence of the Minimizer of \ref{['eq:algorithm:bcd-general']}
Theorem 3.2: Convergence Result of Newton's Method

A Fast and Scalable Pathwise-Solver for Group Lasso and Elastic Net Penalized Regression via Block-Coordinate Descent

TL;DR

Abstract

A Fast and Scalable Pathwise-Solver for Group Lasso and Elastic Net Penalized Regression via Block-Coordinate Descent

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (2)