Optimal Cross-Validation for Sparse Linear Regression
Ryan Cory-Wright, Andrés Gómez
TL;DR
This work tackles the computational burden of hyperparameter tuning in ridge-regularized sparse linear regression with an $\ell_0$ constraint by developing convex, perspective-relaxation-based bounds for the $k$-fold cross-validation error. These relaxations yield tractable upper and lower bounds that obviate solving MIOs for every fold and parameter, enabling a branch-and-bound scheme and a cyclic coordinate-descent procedure to efficiently optimize $(\tau,\gamma)$. Empirically, the approach reduces the number of MIOs by 50–80% and achieves 10–30% lower cross-validation error compared to grid search with MCP or GLMNet across real datasets, with SP-dominated CV performance in overdetermined regimes but some caveats in underdetermined settings. The proposed framework thus offers a practical path to high-quality sparse models with improved generalization while significantly cutting computational costs, and it generalizes naturally to hold-out validation scenarios.
Abstract
Given a high-dimensional covariate matrix and a response vector, ridge-regularized sparse linear regression selects a subset of features that explains the relationship between covariates and the response in an interpretable manner. To select the sparsity and robustness of linear regressors, techniques like k-fold cross-validation are commonly used for hyperparameter tuning. However, cross-validation substantially increases the computational cost of sparse regression as it requires solving many mixed-integer optimization problems (MIOs) for each hyperparameter combination. To improve upon this state of affairs, we obtain computationally tractable relaxations of k-fold cross-validation metrics, facilitating hyperparameter selection after solving 50-80% fewer MIOs in practice. These relaxations result in an efficient cyclic coordinate descent scheme, achieving 10%-30% lower validation errors than via traditional methods such as grid search with MCP or GLMNet across a suite of 13 real-world datasets.
