Monotonic Learning in the PAC Framework: A New Perspective

Ming Li; Chenyi Zhang; Qin Li

Monotonic Learning in the PAC Framework: A New Perspective

Ming Li, Chenyi Zhang, Qin Li

TL;DR

This work reframes generalization in agnostic PAC learning as a distributional problem by deriving a family of upper-bound risk distributions $F_m(\epsilon)$ that evolve with the training size $m$. It proves monotonicity for deterministic ERM learners in two standard settings: finite hypothesis spaces and finite VC-dimension, through corresponding $F_m(\epsilon)$ constructions and sample-complexity bounds. The authors validate the theory with experiments on three tasks (conjunctions, threshold functions, and Iris), showing empirical error distributions $P_m$ are bounded above by the PAC-derived $Q_m$, both shifting toward lower error as $m$ grows and eventually merging when $m$ is large. The findings provide a distribution-centered link between PAC theory and practical algorithmic behavior, offering conservative bounds and insights for stability under finite-sample conditions.

Abstract

Monotone learning describes learning processes in which expected performance consistently improves as the amount of training data increases. However, recent studies challenge this conventional wisdom, revealing significant gaps in the understanding of generalization in machine learning. Addressing these gaps is crucial for advancing the theoretical foundations of the field. In this work, we utilize Probably Approximately Correct (PAC) learning theory to construct a theoretical risk distribution that approximates a learning algorithm's actual performance. We rigorously prove that this theoretical distribution exhibits monotonicity as sample sizes increase. We identify two scenarios under which deterministic algorithms based on Empirical Risk Minimization (ERM) are monotone: (1) the hypothesis space is finite, or (2) the hypothesis space has finite VC-dimension. Experiments on two classical learning problems validate our findings by demonstrating that the monotonicity of the algorithms' generalization error is guaranteed, as its theoretical risk upper bound monotonically converges to 0.

Monotonic Learning in the PAC Framework: A New Perspective

TL;DR

This work reframes generalization in agnostic PAC learning as a distributional problem by deriving a family of upper-bound risk distributions

that evolve with the training size

. It proves monotonicity for deterministic ERM learners in two standard settings: finite hypothesis spaces and finite VC-dimension, through corresponding

constructions and sample-complexity bounds. The authors validate the theory with experiments on three tasks (conjunctions, threshold functions, and Iris), showing empirical error distributions

are bounded above by the PAC-derived

, both shifting toward lower error as

grows and eventually merging when

is large. The findings provide a distribution-centered link between PAC theory and practical algorithmic behavior, offering conservative bounds and insights for stability under finite-sample conditions.

Abstract

Paper Structure (19 sections, 6 theorems, 37 equations, 9 figures, 1 algorithm)

This paper contains 19 sections, 6 theorems, 37 equations, 9 figures, 1 algorithm.

Introduction
Layout of the paper.
Preliminaries
PAC learning is monotone
The finite hypothesis case
Realizability Assumption and tighter bound
The finite VC dimension case
Experiment
Learning Problems in the Experiment
Experiment
Experiment setup.
Experiment results.
Analysis on the results.
Further discussions
Related work
...and 4 more sections

Key Result

Lemma 1

(Sample Complexity for finite $\mathcal{H}$shalev2014understanding) Let $\mathcal{H}$ be a finite hypothesis space, and every training sample is taken i.i.d. from the problem distribution, then the class $\mathcal{H}$ is agnostic PAC learnable using the Empirical Risk Minimization (ERM) rule with th

Figures (9)

Figure 1: The probability function $F_m$ with different $m$ in a finite hypothesis space.
Figure 2: The density function $f_m$ with different $m$ in a finite hypothesis space.
Figure 3: Distributions on the Boolean literal conjunction learning problem with different sample size $m$.
Figure 4: Quantitative analysis of the Boolean literal conjunction learning problem: (a) mean, (b) standard deviation, and (c) Wasserstein distance.
Figure 5: Distributions on the threshold function learning with different sample size $m$.
...and 4 more figures

Theorems & Definitions (14)

Definition 1
Definition 2
Lemma 1
Definition 3
Lemma 2
Corollary 1
Corollary 2
Example 1
Example 2
Example 3
...and 4 more

Monotonic Learning in the PAC Framework: A New Perspective

TL;DR

Abstract

Monotonic Learning in the PAC Framework: A New Perspective

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (9)

Theorems & Definitions (14)