Provably tuning the ElasticNet across instances

Maria-Florina Balcan; Mikhail Khodak; Dravyansh Sharma; Ameet Talwalkar

Provably tuning the ElasticNet across instances

Maria-Florina Balcan, Mikhail Khodak, Dravyansh Sharma, Ameet Talwalkar

TL;DR

This work addresses provable tuning of ElasticNet regularization across multiple problem instances, covering cross-validation and online hyperparameter optimization. It reveals a fundamental piecewise-algebraic structure: the ElasticNet loss is piecewise-rational in the tuning pair $(\\lambda_1,\\lambda_2)$ with algebraic boundaries determined by equicorrelation sets, enabling tight generalization bounds via pseudo-dimension analysis. The authors establish \\textsc{Pdim}(\\mathcal{H}_{EN}) = O(p^2) (and similar bounds for AIC/BIC) and show online learnability with \\tilde{O}(\\sqrt{T}) regret through dispersion-based methods, also extending the results to regularized classification via thresholding. Extensions to distributional and online settings for both regression and classification demonstrate broad applicability without strong distributional assumptions. Overall, the results provide principled, provable guarantees for data-driven tuning of regularized linear models across tasks, with practical implications for cross-validation and multi-task learning.

Abstract

An important unresolved challenge in the theory of regularization is to set the regularization coefficients of popular techniques like the ElasticNet with general provable guarantees. We consider the problem of tuning the regularization parameters of Ridge regression, LASSO, and the ElasticNet across multiple problem instances, a setting that encompasses both cross-validation and multi-task hyperparameter optimization. We obtain a novel structural result for the ElasticNet which characterizes the loss as a function of the tuning parameters as a piecewise-rational function with algebraic boundaries. We use this to bound the structural complexity of the regularized loss functions and show generalization guarantees for tuning the ElasticNet regression coefficients in the statistical setting. We also consider the more challenging online learning setting, where we show vanishing average expected regret relative to the optimal parameter pair. We further extend our results to tuning classification algorithms obtained by thresholding regression fits regularized by Ridge, LASSO, or ElasticNet. Our results are the first general learning-theoretic guarantees for this important class of problems that avoid strong assumptions on the data distribution. Furthermore, our guarantees hold for both validation and popular information criterion objectives.

Provably tuning the ElasticNet across instances

TL;DR

with algebraic boundaries determined by equicorrelation sets, enabling tight generalization bounds via pseudo-dimension analysis. The authors establish \\textsc{Pdim}(\\mathcal{H}_{EN}) = O(p^2) (and similar bounds for AIC/BIC) and show online learnability with \\tilde{O}(\\sqrt{T}) regret through dispersion-based methods, also extending the results to regularized classification via thresholding. Extensions to distributional and online settings for both regression and classification demonstrate broad applicability without strong distributional assumptions. Overall, the results provide principled, provable guarantees for data-driven tuning of regularized linear models across tasks, with practical implications for cross-validation and multi-task learning.

Abstract

Paper Structure (17 sections, 22 theorems, 41 equations, 1 figure, 1 algorithm)

This paper contains 17 sections, 22 theorems, 41 equations, 1 figure, 1 algorithm.

Introduction
Related work
Preliminaries and a Key Structural Result
Piecewise structure of the ElasticNet loss
Learning to Regularize the ElasticNet
Distributional Setting
Online Learning
Extension to Regularized Least Squares Classification
Distributional setting
Online setting
Conclusions and Future Work
A classic Generalization Bound
Known characterization of LASSO solutions
Lemmas and proof details for Section \ref{['sec:regression']}
Tuning the ElasticNet -- Distributional setting
...and 2 more sections

Key Result

Lemma 2.1

Let $A$ be an $r\times s$ matrix. Consider the matrix $B(\lambda)=(A^TA+\lambda I_s)^{-1}$ and $\lambda>0$.

Figures (1)

Figure 1: An illustration of the piecewise structure of the ElasticNet loss, as a function of the regularization parameters, for a fixed problem instance. Pieces are regions where some bounded degree polynomials ($r_1,r_2$) have a fixed sign pattern (one of $\pm1,\pm1$), and in each piece the loss is a fixed (rational) function.

Theorems & Definitions (45)

Definition 1: Piecewise structured functions, balcan2021much
Definition 2: Equicorrelation sets, tibshirani2013lasso
Lemma 2.1
Theorem 2.2
proof
Theorem 3.1
Theorem 3.2: Sample complexity of tuning the ElasticNet
proof
Remark 1
Definition 3
...and 35 more

Provably tuning the ElasticNet across instances

TL;DR

Abstract

Provably tuning the ElasticNet across instances

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (45)