Table of Contents
Fetching ...

The Hessian Screening Rule

Johan Larsson, Jonas Wallin

TL;DR

The paper addresses accelerating high‑dimensional sparse regression along the lasso path by introducing the Hessian Screening Rule, a second‑order predictor screening technique. By viewing screening as a gradient estimation task, it derives a Hessian‑based, second‑order update to predict the next step in the path, augmented with a restricted computation and a unit‑bound term, and combines it with the ever‑active set for robust screening. The approach includes efficient Hessian updates via the sweep operator, warm starts, and extensions to general convex losses and elastic net, along with strategies to reduce KKT checks and integrate Gap Safe screening. Empirical results on simulated and real data show substantial speedups over existing methods, particularly in settings with high predictor correlation, while highlighting memory trade‑offs and practical considerations for very large problems.

Abstract

Predictor screening rules, which discard predictors before fitting a model, have had considerable impact on the speed with which sparse regression problems, such as the lasso, can be solved. In this paper we present a new screening rule for solving the lasso path: the Hessian Screening Rule. The rule uses second-order information from the model to provide both effective screening, particularly in the case of high correlation, as well as accurate warm starts. The proposed rule outperforms all alternatives we study on simulated data sets with both low and high correlation for $\ell_1$-regularized least-squares (the lasso) and logistic regression. It also performs best in general on the real data sets that we examine.

The Hessian Screening Rule

TL;DR

The paper addresses accelerating high‑dimensional sparse regression along the lasso path by introducing the Hessian Screening Rule, a second‑order predictor screening technique. By viewing screening as a gradient estimation task, it derives a Hessian‑based, second‑order update to predict the next step in the path, augmented with a restricted computation and a unit‑bound term, and combines it with the ever‑active set for robust screening. The approach includes efficient Hessian updates via the sweep operator, warm starts, and extensions to general convex losses and elastic net, along with strategies to reduce KKT checks and integrate Gap Safe screening. Empirical results on simulated and real data show substantial speedups over existing methods, particularly in settings with high predictor correlation, while highlighting memory trade‑offs and practical considerations for very large problems.

Abstract

Predictor screening rules, which discard predictors before fitting a model, have had considerable impact on the speed with which sparse regression problems, such as the lasso, can be solved. In this paper we present a new screening rule for solving the lasso path: the Hessian Screening Rule. The rule uses second-order information from the model to provide both effective screening, particularly in the case of high correlation, as well as accurate warm starts. The proposed rule outperforms all alternatives we study on simulated data sets with both low and high correlation for -regularized least-squares (the lasso) and logistic regression. It also performs best in general on the real data sets that we examine.

Paper Structure

This paper contains 35 sections, 3 theorems, 25 equations, 14 figures, 4 tables, 2 algorithms.

Key Result

Theorem 3.1

Let $\hat{\beta}(\lambda)$ be the solution of eq:primal where $f(\beta;X)=\frac{1}{2} \lVert X\beta - y \rVert_2^2$. Define and $\hat{\beta}^{\lambda^*}(\lambda)_{\mathcal{A}_{\lambda^*}^c} = 0.$ If it for $\lambda \in [\lambda_0, \lambda^*]$ holds that (i) $\mathop{\mathrm{sign}}\nolimits(\hat{\beta}^{\lambda^*}(\lambda)) = \mathop{\mathrm{sign}}\nolimits (\hat{\beta}(\lambda^*))$ and (ii) $\ma

Figures (14)

  • Figure 1: The number of predictors screened (included) for when fitting a regularization path of $\ell_1$-regularized least-squares to a design with varying correlation ($\rho$), $n = 200$, and $p = 20000$. The values are averaged over 20 repetitions. The minimum number of active predictors at each step across iterations is given as a dashed line. Note that the y-axis is on a $\log_{10}$ scale.
  • Figure 2: Number of passes of coordinate descent along a full regularization path for the colon-cancer ($n = 62$, $p = 2\,000$) and YearPredictionMSD ($n = 463\,715$, $p = 90$) data sets, using either Hessian warm starts \ref{['eq:warm-start']} or standard warm starts (the solution from the previous step).
  • Figure 3: Time to fit a full regularization path for $\ell_1$-regularized least-squares and logistic regression to a design with $n$ observations, $p$ predictors, and pairwise correlation between predictors of $\rho$. Time is relative to the minimal mean time in each group. The error bars represent ordinary 95% confidence intervals around the mean.
  • Figure 4: The time in seconds required to fit a full regularization path with length given on the x axis.
  • Figure 5: Time required to fit a full regularization path for the high-dimensional scenario setup in \ref{['sec:experiments']} for both $\ell_1$-regularized least-squares and logistic regression, with $n = 200$ and $p = 20\,000$. Both the x and y axis are on a $\log_{10}$ scale.
  • ...and 9 more figures

Theorems & Definitions (7)

  • Theorem 3.1
  • Remark 3.2
  • Remark 3.3
  • Lemma 3.4
  • proof
  • Lemma C.1
  • proof