Table of Contents
Fetching ...

Prevalidated ridge regression is a highly-efficient drop-in replacement for logistic regression for high-dimensional data

Angus Dempster, Geoffrey I. Webb, Daniel F. Schmidt

TL;DR

Logistic regression requires careful hyperparameter tuning and can be computationally intensive in high dimensions, while ridge regression is fast but yields nonprobabilistic outputs. The authors propose PreVal, a ridge-based classifier whose coefficients are scaled by κ to minimise log-loss on prevalidated LOOCV predictions, effectively producing probabilistic outputs with minimal hyperparameter overhead. Through SVD preprocessing and a joint optimization over κ and λ, PreVal matches the predictive performance of regularised LR (0–1 loss and log-loss) across 273 high-dimensional datasets while achieving substantial computational speedups (up to 1000× in some settings). This makes PreVal a practical drop-in replacement for LR in applications with large p, offering efficient probabilistic predictions without extensive cross-validation or tuning.

Abstract

Logistic regression is a ubiquitous method for probabilistic classification. However, the effectiveness of logistic regression depends upon careful and relatively computationally expensive tuning, especially for the regularisation hyperparameter, and especially in the context of high-dimensional data. We present a prevalidated ridge regression model that closely matches logistic regression in terms of classification error and log-loss, particularly for high-dimensional data, while being significantly more computationally efficient and having effectively no hyperparameters beyond regularisation. We scale the coefficients of the model so as to minimise log-loss for a set of prevalidated predictions derived from the estimated leave-one-out cross-validation error. This exploits quantities already computed in the course of fitting the ridge regression model in order to find the scaling parameter with nominal additional computational expense.

Prevalidated ridge regression is a highly-efficient drop-in replacement for logistic regression for high-dimensional data

TL;DR

Logistic regression requires careful hyperparameter tuning and can be computationally intensive in high dimensions, while ridge regression is fast but yields nonprobabilistic outputs. The authors propose PreVal, a ridge-based classifier whose coefficients are scaled by κ to minimise log-loss on prevalidated LOOCV predictions, effectively producing probabilistic outputs with minimal hyperparameter overhead. Through SVD preprocessing and a joint optimization over κ and λ, PreVal matches the predictive performance of regularised LR (0–1 loss and log-loss) across 273 high-dimensional datasets while achieving substantial computational speedups (up to 1000× in some settings). This makes PreVal a practical drop-in replacement for LR in applications with large p, offering efficient probabilistic predictions without extensive cross-validation or tuning.

Abstract

Logistic regression is a ubiquitous method for probabilistic classification. However, the effectiveness of logistic regression depends upon careful and relatively computationally expensive tuning, especially for the regularisation hyperparameter, and especially in the context of high-dimensional data. We present a prevalidated ridge regression model that closely matches logistic regression in terms of classification error and log-loss, particularly for high-dimensional data, while being significantly more computationally efficient and having effectively no hyperparameters beyond regularisation. We scale the coefficients of the model so as to minimise log-loss for a set of prevalidated predictions derived from the estimated leave-one-out cross-validation error. This exploits quantities already computed in the course of fitting the ridge regression model in order to find the scaling parameter with nominal additional computational expense.
Paper Structure (32 sections, 3 theorems, 32 equations, 11 figures, 1 algorithm)

This paper contains 32 sections, 3 theorems, 32 equations, 11 figures, 1 algorithm.

Key Result

Theorem 1

Let $\hat{\bm{\beta}}_\lambda$ be a ridge regression estimate on data ${\bf X}$ and ${\bf y}$, and let ${\bf e}(\lambda) = {\bf y} - {\bf X} \hat{\beta}_\lambda$ be the residuals of the ridge regression. Then the LOOCV prediction for observation $i$ is where $d_i(\lambda) = {\bf x}_i^{\rm T} ({\bf X}^{\rm T} {\bf X} + \lambda {\bf I})^{-1} {\bf x}_i$ is the $i$-th diagonal of the hat matrix ${\bf

Figures (11)

  • Figure 1: Learning curves (log-loss) for PreVal (blue), LR (orange), ridge regression (green), and naïvely scaled ridge regression (red), for increasing numbers of features, $p \in \{2^{8}, 2^{9}, \dots, 2^{14}\}$, for a random projection of the MNIST dataset. PreVal requires a fraction of the compute while closely matching the log-loss of LR in most scenarios, often showing advantage when $p$ is large relative to $n$.
  • Figure 2: Pairwise 0-- 1 loss (left), log-loss (centre), and training time (right) for PreVal vs LR on tabular datasets (with interactions). PreVal closely matches LR for both 0-- 1 and log-loss on most datasets while requiring a fraction of the training time.
  • Figure 3: Pairwise 0-- 1 loss (left), log-loss (centre), and training time (right) for PreVal vs LR on microarray datasets. PreVal closely matches LR in terms of both 0-- 1 and log-loss while requiring only a small fraction of the training time.
  • Figure 4: Learning curves (0-- 1 loss) for PreVal (blue) and LR (orange) for a random projection of the MNIST dataset. PreVal achieves consistently lower 0-- 1 loss for small $n$. Asymptotic 0-- 1 loss for PreVal approaches that of LR as $p$ grows.
  • Figure 5: Pairwise 0-- 1 loss (left), log-loss (centre), and training time (right) for PreVal vs LR on time series datasets. PreVal closely matches LR in terms of both 0-- 1 and log-loss while requiring only a fraction of the training time.
  • ...and 6 more figures

Theorems & Definitions (3)

  • Theorem 1
  • Theorem 2
  • Lemma 3