Closed-form $\ell_r$ norm scaling with data for overparameterized linear regression and diagonal linear networks under $\ell_p$ bias

Shuofeng Zhang; Ard Louis

Closed-form $\ell_r$ norm scaling with data for overparameterized linear regression and diagonal linear networks under $\ell_p$ bias

Shuofeng Zhang, Ard Louis

TL;DR

This work provides, for overparameterized linear regression with minimum-ℓ_p interpolation (p∈(1,2]), a closed-form, high-probability characterization of how the entire ℓ_r norm family (r∈[1,p]) scales with sample size. A simple dual-ray analysis reveals a competition between a signal spike and a bulk of null coordinates in X^⊤Y, yielding a data-dependent elbow n⋆ and a universal threshold r⋆=2(p−1) that splits norms into plateauing versus growing regimes with explicit exponents. The theory extends to diagonal linear networks by mapping initialization α to an effective p_eff(α), and experiments show DLNs inherit the same elbow/threshold structure, bridging explicit and implicit bias. These results imply that norm-based generalization diagnostics can be highly sensitive to the chosen r and underlying p-bias, providing practical guidance for selecting r and interpreting norm proxies in high-dimensional interpolation tasks.

Abstract

For overparameterized linear regression with isotropic Gaussian design and minimum-$\ell_p$ interpolator $p\in(1,2]$, we give a unified, high-probability characterization for the scaling of the family of parameter norms $ \\{ \lVert \widehat{w_p} \rVert_r \\}_{r \in [1,p]} $ with sample size. We solve this basic, but unresolved question through a simple dual-ray analysis, which reveals a competition between a signal *spike* and a *bulk* of null coordinates in $X^\top Y$, yielding closed-form predictions for (i) a data-dependent transition $n_\star$ (the "elbow"), and (ii) a universal threshold $r_\star=2(p-1)$ that separates $\lVert \widehat{w_p} \rVert_r$'s which plateau from those that continue to grow with an explicit exponent. This unified solution resolves the scaling of *all* $\ell_r$ norms within the family $r\in [1,p]$ under $\ell_p$-biased interpolation, and explains in one picture which norms saturate and which increase as $n$ grows. We then study diagonal linear networks (DLNs) trained by gradient descent. By calibrating the initialization scale $α$ to an effective $p_{\mathrm{eff}}(α)$ via the DLN separable potential, we show empirically that DLNs inherit the same elbow/threshold laws, providing a predictive bridge between explicit and implicit bias. Given that many generalization proxies depend on $\lVert \widehat {w_p} \rVert_r$, our results suggest that their predictive power will depend sensitively on which $l_r$ norm is used.

Closed-form $\ell_r$ norm scaling with data for overparameterized linear regression and diagonal linear networks under $\ell_p$ bias

TL;DR

Abstract

For overparameterized linear regression with isotropic Gaussian design and minimum-

interpolator

, we give a unified, high-probability characterization for the scaling of the family of parameter norms

with sample size. We solve this basic, but unresolved question through a simple dual-ray analysis, which reveals a competition between a signal *spike* and a *bulk* of null coordinates in

, yielding closed-form predictions for (i) a data-dependent transition

(the "elbow"), and (ii) a universal threshold

that separates

's which plateau from those that continue to grow with an explicit exponent. This unified solution resolves the scaling of *all*

norms within the family

under

-biased interpolation, and explains in one picture which norms saturate and which increase as

grows. We then study diagonal linear networks (DLNs) trained by gradient descent. By calibrating the initialization scale

to an effective

via the DLN separable potential, we show empirically that DLNs inherit the same elbow/threshold laws, providing a predictive bridge between explicit and implicit bias. Given that many generalization proxies depend on

, our results suggest that their predictive power will depend sensitively on which

norm is used.

Closed-form $\ell_r$ norm scaling with data for overparameterized linear regression and diagonal linear networks under $\ell_p$ bias

TL;DR

Abstract

Closed-form $\ell_r$ norm scaling with data for overparameterized linear regression and diagonal linear networks under $\ell_p$ bias

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (18)

Theorems & Definitions (24)