Table of Contents
Fetching ...

A Fast and Scalable Pathwise-Solver for Group Lasso and Elastic Net Penalized Regression via Block-Coordinate Descent

James Yang, Trevor Hastie

TL;DR

This work introduces a fast, scalable block-coordinate descent solver for the group lasso and group elastic net under Gaussian loss, extended to general convex losses via proximal quasi-Newton. A key novelty is solving each block update with a Newton-based method by rotating to the eigenbasis of $X_g^T W X_g$, reducing the subproblem to a diagonal quadratic form and a one-dimensional root find for $\|x\|_2$. The algorithm leverages a pathwise strategy, screening rules, and an adaptive Newton-ABS variant to achieve quadratic convergence and strong empirical speedups (3–10×) over existing solvers, including competitive lasso performance against glmnet. Its multi-response and GLM extensions, plus a matrix abstraction that accommodates structured data (e.g., GWAS), broaden applicability to large-scale, real-world problems with high group counts and complex loss surfaces.

Abstract

We develop fast and scalable algorithms based on block-coordinate descent to solve the group lasso and the group elastic net for generalized linear models along a regularization path. Special attention is given when the loss is the usual least squares loss (Gaussian loss). We show that each block-coordinate update can be solved efficiently using Newton's method and further improved using an adaptive bisection method, solving these updates with a quadratic convergence rate. Our benchmarks show that our package adelie performs 3 to 10 times faster than the next fastest package on a wide array of both simulated and real datasets. Moreover, we demonstrate that our package is a competitive lasso solver as well, matching the performance of the popular lasso package glmnet.

A Fast and Scalable Pathwise-Solver for Group Lasso and Elastic Net Penalized Regression via Block-Coordinate Descent

TL;DR

This work introduces a fast, scalable block-coordinate descent solver for the group lasso and group elastic net under Gaussian loss, extended to general convex losses via proximal quasi-Newton. A key novelty is solving each block update with a Newton-based method by rotating to the eigenbasis of , reducing the subproblem to a diagonal quadratic form and a one-dimensional root find for . The algorithm leverages a pathwise strategy, screening rules, and an adaptive Newton-ABS variant to achieve quadratic convergence and strong empirical speedups (3–10×) over existing solvers, including competitive lasso performance against glmnet. Its multi-response and GLM extensions, plus a matrix abstraction that accommodates structured data (e.g., GWAS), broaden applicability to large-scale, real-world problems with high group counts and complex loss surfaces.

Abstract

We develop fast and scalable algorithms based on block-coordinate descent to solve the group lasso and the group elastic net for generalized linear models along a regularization path. Special attention is given when the loss is the usual least squares loss (Gaussian loss). We show that each block-coordinate update can be solved efficiently using Newton's method and further improved using an adaptive bisection method, solving these updates with a quadratic convergence rate. Our benchmarks show that our package adelie performs 3 to 10 times faster than the next fastest package on a wide array of both simulated and real datasets. Moreover, we demonstrate that our package is a competitive lasso solver as well, matching the performance of the popular lasso package glmnet.
Paper Structure (26 sections, 2 theorems, 53 equations, 7 figures, 1 table, 4 algorithms)

This paper contains 26 sections, 2 theorems, 53 equations, 7 figures, 1 table, 4 algorithms.

Key Result

Theorem 3.1

Consider the optimization problem of eq:algorithm:bcd-general with $\lambda > 0$. If the following condition holds then a finite minimizer exists. Moreover, the minimizer is given by

Figures (7)

  • Figure 1: A comparison of the coefficient profile for the group lasso, group elastic net, and the ridge on the Leukemia dataset by Golub1999. From left to right, we gradually see more features entering the model with an overall shrinkage on the coefficients towards zero until the ridge includes every feature.
  • Figure 2: Time and accuracy comparison between the proximal gradient methods and our Newton's method. Overall, the proximal gradient methods suffer from large computational overhead due to large number of iterations, and fails to converge as the dimension increases. Contrastingly, Newton's method is quite stable, properly converges, and runs 10 to 1000 times faster.
  • Figure 3: Plot of the iterates for the vanilla Newton's method (left) and Newton-ABS (right). In both plots, we display the iterates by their value of $\varphi$ and describe their path by the color gradient scheme. The iterates are guaranteed to move from left to right. The solution is marked with a star. For visualization purposes, the axes were scaled using the symmetrical log transformation, which deceptively makes $\varphi$ look non-convex. It is clear that Newton's method struggles where there is a sharp decay near the origin as the Newton iterates slowly exit the kink. Newton-ABS gets around this problem by finding a good initial point sufficiently away from the origin using our adaptive bisection strategy.
  • Figure 4: Timing comparisons for solving the group lasso under the Gaussian loss (\ref{['fig:benchmark:group-lasso-sim:gaussian']}) and the Binomial loss (\ref{['fig:benchmark:group-lasso-sim:binomial']}) against existing R packages. We study a small ($n=100$) and large ($n=1000$) sample size case, and for each case we vary the (equi-)correlation of the features ($\rho$).
  • Figure 5: Timing comparisons for solving the group lasso for the real datasets against existing R packages.
  • ...and 2 more figures

Theorems & Definitions (2)

  • Theorem 3.1: Sufficient Condition for the Existence of the Minimizer of \ref{['eq:algorithm:bcd-general']}
  • Theorem 3.2: Convergence Result of Newton's Method