Statistical learning theory and Occam's razor: The core argument

Tom F. Sterkenburg

Statistical learning theory and Occam's razor: The core argument

Tom F. Sterkenburg

TL;DR

This paper argues for a model-relative, means-ends justification of Occam's razor grounded in statistical learning theory. It explicates how binary classification, empirical risk minimization, uniform convergence, and VC dimension yield learning guarantees that favor simpler hypothesis classes. While acknowledging no universal (absolute) justification due to no-free-lunch theorems, it shows how a structured, quantitative trade-off—embodied in structural risk minimization—can automate a principled simplicity bias aligned with prior knowledge. The work situates this within an epistemological framework that ties theory to practice, offering a pragmatic account of why simpler inductive models often generalize better and how SRM operationalizes this preference for simplicity in real learning problems.

Abstract

Statistical learning theory is often associated with the principle of Occam's razor, which recommends a simplicity preference in inductive inference. This paper distills the core argument for simplicity obtainable from statistical learning theory, built on the theory's central learning guarantee for the method of empirical risk minimization. This core "means-ends" argument is that a simpler hypothesis class or inductive model is better because it has better learning guarantees; however, these guarantees are model-relative and so the theoretical push towards simplicity is checked by our prior knowledge.

Statistical learning theory and Occam's razor: The core argument

TL;DR

Abstract

Paper Structure (26 sections, 2 theorems, 11 equations)

This paper contains 26 sections, 2 theorems, 11 equations.

Introduction
The formal ingredients
Binary classification
Hypothesis classes and learning methods
Learnability
Uniform convergence and empirical risk minimization
The VC dimension
Bringing it all together
The notion of simplicity
Individual hypotheses and hypothesis classes
VC dimension as a measure of simplicity
The justification for simplicity
The justification for empirical risk minimization
Qualifications
The bias-complexity trade-off
...and 11 more sections

Key Result

Theorem 5

The following are equivalent:

Theorems & Definitions (6)

Definition 1
Definition 2: Learnability
Definition 3: Uniform convergence
Definition 4
Theorem 5: Fundamental theorem of statistical learning theory
Theorem 6: Fundamental theorem, quantitative version,

Statistical learning theory and Occam's razor: The core argument

TL;DR

Abstract

Statistical learning theory and Occam's razor: The core argument

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (6)