A Regularization-Sharpness Tradeoff for Linear Interpolators

Qingyi Hu; Liam Hodgkinson

A Regularization-Sharpness Tradeoff for Linear Interpolators

Qingyi Hu, Liam Hodgkinson

TL;DR

A regularization-sharpness tradeoff for overparameterized linear regression with an $\ell^p$ penalty is proposed, demonstrating how the tradeoff terms can distinguish performant linear interpolators from weaker ones.

Abstract

The rule of thumb regarding the relationship between the bias-variance tradeoff and model size plays a key role in classical machine learning, but is now well-known to break down in the overparameterized setting as per the double descent curve. In particular, minimum-norm interpolating estimators can perform well, suggesting the need for new tradeoff in these settings. Accordingly, we propose a regularization-sharpness tradeoff for overparameterized linear regression with an $\ell^p$ penalty. Inspired by the interpolating information criterion, our framework decomposes the selection penalty into a regularization term (quantifying the alignment of the regularizer and the interpolator) and a geometric sharpness term on the interpolating manifold (quantifying the effect of local perturbations), yielding a tradeoff analogous to bias-variance. Building on prior analyses that established this information criterion for ridge regularizers, this work first provides a general expression of the interpolating information criterion for $\ell^p$ regularizers where $p \ge 2$. Subsequently, we extend this to the LASSO interpolator with $\ell^1$ regularizer, which induces stronger sparsity. Empirical results on real-world datasets with random Fourier features and polynomials validate our theory, demonstrating how the tradeoff terms can distinguish performant linear interpolators from weaker ones.

A Regularization-Sharpness Tradeoff for Linear Interpolators

TL;DR

A regularization-sharpness tradeoff for overparameterized linear regression with an

penalty is proposed, demonstrating how the tradeoff terms can distinguish performant linear interpolators from weaker ones.

Abstract

penalty. Inspired by the interpolating information criterion, our framework decomposes the selection penalty into a regularization term (quantifying the alignment of the regularizer and the interpolator) and a geometric sharpness term on the interpolating manifold (quantifying the effect of local perturbations), yielding a tradeoff analogous to bias-variance. Building on prior analyses that established this information criterion for ridge regularizers, this work first provides a general expression of the interpolating information criterion for

regularizers where

. Subsequently, we extend this to the LASSO interpolator with

regularizer, which induces stronger sparsity. Empirical results on real-world datasets with random Fourier features and polynomials validate our theory, demonstrating how the tradeoff terms can distinguish performant linear interpolators from weaker ones.

Paper Structure (28 sections, 15 theorems, 125 equations, 4 figures, 2 tables)

This paper contains 28 sections, 15 theorems, 125 equations, 4 figures, 2 tables.

Introduction
Contribution
Background and Related Work
Classical vs. Interpolating Information Criteria.
Generalization Properties.
Sparse Interpolators.
Notation and Preliminaries
Linear Regression
Importance of Marginal Likelihood
Generalization Error and PAC-Bayes
IIC for Linear Interpolators
IIC with Ridge Regularization
IIC with Smooth Regularization
IIC with Sparse Regularization
Numerical Experiments
...and 13 more sections

Key Result

Lemma 4.2

There exists a dual model with a dual prior $\pi^\ast$ and a dual likelihood $p^\ast(Z) = c_{n,\gamma} e^{-\frac{1}{\gamma}\sum_{i=1}^n \ell(z_i,y_i)}$ whose marginal likelihood $Z_n^\ast$ satisfies $Z_n = Z_n^\ast$. Consequently, as $\gamma \to 0^+$, $Z_n \to \pi^\ast(Y)$.

Figures (4)

Figure 1: Decomposition of the Interpolating Information Criterion (green) for minimum $\ell^3$-norm interpolating solutions using random Fourier features as a tradeoff between the effect of regularization (purple) and local sharpness (blue).
Figure 2: Decomposition of the Interpolating Information Criterion (green) for minimum $\ell^p$-norm interpolating solutions (with varying $p$) using random Fourier features as a tradeoff between the effect of regularization (purple) and local sharpness (blue).
Figure 3: Decomposition of the Interpolating Information Criterion (green) for minimum $\ell^p$-norm interpolating solutions (with varying $p$) using polynomial features as a tradeoff between the effect of regularization (purple) and local sharpness (blue). These estimators perform poorly.
Figure 4: Decomposition of the Interpolating Information Criterion (green) for minimum $\ell^1$-norm interpolating solutions using random Fourier features as a tradeoff between the effect of regularization (purple) and local sharpness (blue). This plot uses the FLIR dataset and randomly select one sample to show the particular result in Corollary \ref{['thm:iic_p1_n1']}.

Theorems & Definitions (24)

Definition 4.1: IIC hodgkinson_interpolating_2023
Lemma 4.2: Bayesian Duality
Theorem 4.4: hodgkinson_interpolating_2023
Theorem 4.5
Theorem 4.6
Corollary 4.7
Theorem A.1: Laplace Approximation
Lemma B.1: Proposition 1 of hodgkinson_interpolating_2023
Lemma C.1
proof
...and 14 more

A Regularization-Sharpness Tradeoff for Linear Interpolators

TL;DR

Abstract

A Regularization-Sharpness Tradeoff for Linear Interpolators

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (24)