Quantifying Overfitting along the Regularization Path for Two-Part-Code MDL in Supervised Classification

Xiaohan Zhu; Nathan Srebro

Quantifying Overfitting along the Regularization Path for Two-Part-Code MDL in Supervised Classification

Xiaohan Zhu, Nathan Srebro

TL;DR

This work analyzes a modified two-part-code MDL rule, MDL$_{\lambda}$, for supervised binary classification under agnostic learning and tracks the entire regularization path as $\lambda$ varies. The authors derive an exact worst-case limiting error function $\ell_{\lambda}(L^*)$, establish finite-sample PAC-Bayes-based upper bounds, and construct matching lower bounds to prove tightness. They show tempered overfitting for $\lambda\ge 1$ (and more nuanced behavior for $\lambda<1$), with consistency recovered when $\lambda$ grows as $\sqrt{m}$ or faster but potential catastrophic under- or over-regularization for other growth rates. The results provide a baseline for the cost of overfitting along MDL’s regularization path and offer guidance on selecting $\lambda$ in practice, with implications for understanding model complexity and generalization in discrete-prior MDL frameworks.

Abstract

We provide a complete characterization of the entire regularization curve of a modified two-part-code Minimum Description Length (MDL) learning rule for binary classification, based on an arbitrary prior or description language. Grunwald and Langford [2004] previously established the lack of asymptotic consistency, from an agnostic PAC (frequentist worst case) perspective, of the MDL rule with a penalty parameter of $λ=1$, suggesting that it underegularizes. Driven by interest in understanding how benign or catastrophic under-regularization and overfitting might be, we obtain a precise quantitative description of the worst case limiting error as a function of the regularization parameter $λ$ and noise level (or approximation error), significantly tightening the analysis of Grunwald and Langford for $λ=1$ and extending it to all other choices of $λ$.

Quantifying Overfitting along the Regularization Path for Two-Part-Code MDL in Supervised Classification

TL;DR

This work analyzes a modified two-part-code MDL rule, MDL

, for supervised binary classification under agnostic learning and tracks the entire regularization path as

varies. The authors derive an exact worst-case limiting error function

, establish finite-sample PAC-Bayes-based upper bounds, and construct matching lower bounds to prove tightness. They show tempered overfitting for

(and more nuanced behavior for

), with consistency recovered when

grows as

or faster but potential catastrophic under- or over-regularization for other growth rates. The results provide a baseline for the cost of overfitting along MDL’s regularization path and offer guidance on selecting

in practice, with implications for understanding model complexity and generalization in discrete-prior MDL frameworks.

Abstract

, suggesting that it underegularizes. Driven by interest in understanding how benign or catastrophic under-regularization and overfitting might be, we obtain a precise quantitative description of the worst case limiting error as a function of the regularization parameter

and noise level (or approximation error), significantly tightening the analysis of Grunwald and Langford for

and extending it to all other choices of

Quantifying Overfitting along the Regularization Path for Two-Part-Code MDL in Supervised Classification

TL;DR

Abstract

Quantifying Overfitting along the Regularization Path for Two-Part-Code MDL in Supervised Classification

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (32)