Statistical learning theory and Occam's razor: The core argument
Tom F. Sterkenburg
TL;DR
This paper argues for a model-relative, means-ends justification of Occam's razor grounded in statistical learning theory. It explicates how binary classification, empirical risk minimization, uniform convergence, and VC dimension yield learning guarantees that favor simpler hypothesis classes. While acknowledging no universal (absolute) justification due to no-free-lunch theorems, it shows how a structured, quantitative trade-off—embodied in structural risk minimization—can automate a principled simplicity bias aligned with prior knowledge. The work situates this within an epistemological framework that ties theory to practice, offering a pragmatic account of why simpler inductive models often generalize better and how SRM operationalizes this preference for simplicity in real learning problems.
Abstract
Statistical learning theory is often associated with the principle of Occam's razor, which recommends a simplicity preference in inductive inference. This paper distills the core argument for simplicity obtainable from statistical learning theory, built on the theory's central learning guarantee for the method of empirical risk minimization. This core "means-ends" argument is that a simpler hypothesis class or inductive model is better because it has better learning guarantees; however, these guarantees are model-relative and so the theoretical push towards simplicity is checked by our prior knowledge.
