Table of Contents
Fetching ...

A Regularization-Sharpness Tradeoff for Linear Interpolators

Qingyi Hu, Liam Hodgkinson

TL;DR

A regularization-sharpness tradeoff for overparameterized linear regression with an $\ell^p$ penalty is proposed, demonstrating how the tradeoff terms can distinguish performant linear interpolators from weaker ones.

Abstract

The rule of thumb regarding the relationship between the bias-variance tradeoff and model size plays a key role in classical machine learning, but is now well-known to break down in the overparameterized setting as per the double descent curve. In particular, minimum-norm interpolating estimators can perform well, suggesting the need for new tradeoff in these settings. Accordingly, we propose a regularization-sharpness tradeoff for overparameterized linear regression with an $\ell^p$ penalty. Inspired by the interpolating information criterion, our framework decomposes the selection penalty into a regularization term (quantifying the alignment of the regularizer and the interpolator) and a geometric sharpness term on the interpolating manifold (quantifying the effect of local perturbations), yielding a tradeoff analogous to bias-variance. Building on prior analyses that established this information criterion for ridge regularizers, this work first provides a general expression of the interpolating information criterion for $\ell^p$ regularizers where $p \ge 2$. Subsequently, we extend this to the LASSO interpolator with $\ell^1$ regularizer, which induces stronger sparsity. Empirical results on real-world datasets with random Fourier features and polynomials validate our theory, demonstrating how the tradeoff terms can distinguish performant linear interpolators from weaker ones.

A Regularization-Sharpness Tradeoff for Linear Interpolators

TL;DR

A regularization-sharpness tradeoff for overparameterized linear regression with an penalty is proposed, demonstrating how the tradeoff terms can distinguish performant linear interpolators from weaker ones.

Abstract

The rule of thumb regarding the relationship between the bias-variance tradeoff and model size plays a key role in classical machine learning, but is now well-known to break down in the overparameterized setting as per the double descent curve. In particular, minimum-norm interpolating estimators can perform well, suggesting the need for new tradeoff in these settings. Accordingly, we propose a regularization-sharpness tradeoff for overparameterized linear regression with an penalty. Inspired by the interpolating information criterion, our framework decomposes the selection penalty into a regularization term (quantifying the alignment of the regularizer and the interpolator) and a geometric sharpness term on the interpolating manifold (quantifying the effect of local perturbations), yielding a tradeoff analogous to bias-variance. Building on prior analyses that established this information criterion for ridge regularizers, this work first provides a general expression of the interpolating information criterion for regularizers where . Subsequently, we extend this to the LASSO interpolator with regularizer, which induces stronger sparsity. Empirical results on real-world datasets with random Fourier features and polynomials validate our theory, demonstrating how the tradeoff terms can distinguish performant linear interpolators from weaker ones.
Paper Structure (28 sections, 15 theorems, 125 equations, 4 figures, 2 tables)

This paper contains 28 sections, 15 theorems, 125 equations, 4 figures, 2 tables.

Key Result

Lemma 4.2

There exists a dual model with a dual prior $\pi^\ast$ and a dual likelihood $p^\ast(Z) = c_{n,\gamma} e^{-\frac{1}{\gamma}\sum_{i=1}^n \ell(z_i,y_i)}$ whose marginal likelihood $Z_n^\ast$ satisfies $Z_n = Z_n^\ast$. Consequently, as $\gamma \to 0^+$, $Z_n \to \pi^\ast(Y)$.

Figures (4)

  • Figure 1: Decomposition of the Interpolating Information Criterion (green) for minimum $\ell^3$-norm interpolating solutions using random Fourier features as a tradeoff between the effect of regularization (purple) and local sharpness (blue).
  • Figure 2: Decomposition of the Interpolating Information Criterion (green) for minimum $\ell^p$-norm interpolating solutions (with varying $p$) using random Fourier features as a tradeoff between the effect of regularization (purple) and local sharpness (blue).
  • Figure 3: Decomposition of the Interpolating Information Criterion (green) for minimum $\ell^p$-norm interpolating solutions (with varying $p$) using polynomial features as a tradeoff between the effect of regularization (purple) and local sharpness (blue). These estimators perform poorly.
  • Figure 4: Decomposition of the Interpolating Information Criterion (green) for minimum $\ell^1$-norm interpolating solutions using random Fourier features as a tradeoff between the effect of regularization (purple) and local sharpness (blue). This plot uses the FLIR dataset and randomly select one sample to show the particular result in Corollary \ref{['thm:iic_p1_n1']}.

Theorems & Definitions (24)

  • Definition 4.1: IIC hodgkinson_interpolating_2023
  • Lemma 4.2: Bayesian Duality
  • Theorem 4.4: hodgkinson_interpolating_2023
  • Theorem 4.5
  • Theorem 4.6
  • Corollary 4.7
  • Theorem A.1: Laplace Approximation
  • Lemma B.1: Proposition 1 of hodgkinson_interpolating_2023
  • Lemma C.1
  • proof
  • ...and 14 more