Optimal Cross-Validation for Sparse Linear Regression

Ryan Cory-Wright; Andrés Gómez

Optimal Cross-Validation for Sparse Linear Regression

Ryan Cory-Wright, Andrés Gómez

TL;DR

This work tackles the computational burden of hyperparameter tuning in ridge-regularized sparse linear regression with an $\ell_0$ constraint by developing convex, perspective-relaxation-based bounds for the $k$-fold cross-validation error. These relaxations yield tractable upper and lower bounds that obviate solving MIOs for every fold and parameter, enabling a branch-and-bound scheme and a cyclic coordinate-descent procedure to efficiently optimize $(\tau,\gamma)$. Empirically, the approach reduces the number of MIOs by 50–80% and achieves 10–30% lower cross-validation error compared to grid search with MCP or GLMNet across real datasets, with SP-dominated CV performance in overdetermined regimes but some caveats in underdetermined settings. The proposed framework thus offers a practical path to high-quality sparse models with improved generalization while significantly cutting computational costs, and it generalizes naturally to hold-out validation scenarios.

Abstract

Given a high-dimensional covariate matrix and a response vector, ridge-regularized sparse linear regression selects a subset of features that explains the relationship between covariates and the response in an interpretable manner. To select the sparsity and robustness of linear regressors, techniques like k-fold cross-validation are commonly used for hyperparameter tuning. However, cross-validation substantially increases the computational cost of sparse regression as it requires solving many mixed-integer optimization problems (MIOs) for each hyperparameter combination. To improve upon this state of affairs, we obtain computationally tractable relaxations of k-fold cross-validation metrics, facilitating hyperparameter selection after solving 50-80% fewer MIOs in practice. These relaxations result in an efficient cyclic coordinate descent scheme, achieving 10%-30% lower validation errors than via traditional methods such as grid search with MCP or GLMNet across a suite of 13 real-world datasets.

Optimal Cross-Validation for Sparse Linear Regression

TL;DR

This work tackles the computational burden of hyperparameter tuning in ridge-regularized sparse linear regression with an

constraint by developing convex, perspective-relaxation-based bounds for the

-fold cross-validation error. These relaxations yield tractable upper and lower bounds that obviate solving MIOs for every fold and parameter, enabling a branch-and-bound scheme and a cyclic coordinate-descent procedure to efficiently optimize

. Empirically, the approach reduces the number of MIOs by 50–80% and achieves 10–30% lower cross-validation error compared to grid search with MCP or GLMNet across real datasets, with SP-dominated CV performance in overdetermined regimes but some caveats in underdetermined settings. The proposed framework thus offers a practical path to high-quality sparse models with improved generalization while significantly cutting computational costs, and it generalizes naturally to hold-out validation scenarios.

Abstract

Paper Structure (23 sections, 7 theorems, 37 equations, 3 figures, 3 tables, 2 algorithms)

This paper contains 23 sections, 7 theorems, 37 equations, 3 figures, 3 tables, 2 algorithms.

Introduction
The Cross-Validation Paradigm:
Our Approach:
Literature Review
Hyperparameter Selection Techniques for Machine Learning Problems:
Bilevel Optimization for Hyperparameter Selection:
Structure
Convex Relaxations of $k$-fold Cross-Validation Error
Bounds on the Prediction Spread
Closed-form Bounds on the Prediction Spread
Further Improvements for Lower Bounds
Optimizing the Cross-Validation Loss
Parametric Optimization of $k$-fold With Respect to Sparsity
Algorithm \ref{['alg:parametricK']} in Action:
Parametric Optimization of $k$-fold Error With Respect to $\gamma$
...and 8 more sections

Key Result

Theorem 1

Given any $0 < \gamma$ and any bound the inequality holds, where $\bm{\beta}_{MIO}^*$ is an optimal solution of eq:MIPII and $\bm{\beta}_{persp}^*$ is optimal to eq:persp.

Figures (3)

Figure 1: Comparison of initial bounds on LOOCV ($k$-fold with $k=n$) from Algorithm \ref{['alg:bounds']} (left) and bounds after running Algorithm \ref{['alg:parametricK']} (right) for a synthetic sparse regression instance where $p=20, n=200, \tau_{\text{true}}=10$, for varying $\tau$. The black number in the top middle depicts the iteration number of the method.
Figure 2: Reduction in the number of MIO solved (left) and the total number of branch-and-bound nodes (right) when using Algorithm \ref{['alg:parametricK']} for leave-one-out cross-validation, when compared with Grid (i.e., independently solving $\mathcal{O}(pn)$ MIOs) in four real datasets. The distributions shown in the figure correspond to solving the same instance with different values of $\gamma$. All MIOs are solved to optimality, without imposing any time limits.
Figure 3: Reduction in the number of MIO solved (left) and the total number of branch-and-bound nodes (right) when using Algorithm \ref{['alg:parametricK']} for 10-fold cross-validation, when compared with Grid (i.e., independently solving $\mathcal{O}(pk)$ MIOs) in four real datasets. The distributions shown in the figure correspond to solving the same instance with different values of $\gamma$. All MIOs are solved to optimality, without imposing any time limits.

Theorems & Definitions (12)

Theorem 1
Remark 1: Computability of the bounds
Theorem 2
Corollary 1
Corollary 2
Remark 2: Relaxation Tightness
Remark 3: Intuition
Proposition 1
Proposition 2
Corollary 3
...and 2 more

Optimal Cross-Validation for Sparse Linear Regression

TL;DR

Abstract

Optimal Cross-Validation for Sparse Linear Regression

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (12)