Table of Contents
Fetching ...

A Novel Data-Dependent Learning Paradigm for Large Hypothesis Classes

Alireza F. Pour, Shai Ben-David

TL;DR

This work tackles learning when the candidate hypothesis set is too large for uniform convergence by introducing a data-dependent growth parameter $ au_{ amespace{H}}(m)$ that quantifies how many distinct behaviours a class induces on a sample. It replaces fixed-a-priori class weights (as in SRM) with compression-based, data-dependent guarantees that adapt to the effective complexity of the classes containing near-optimal hypotheses. The framework encompasses partial concepts and a variety of prior-knowledge forms, including hierarchical clustering, similarity graphs, nearest-neighbor priors, Lipschitz-real-valued functions, smoothness/margin conditions, and contrastive assumptions, delivering risk bounds that scale with data-driven complexity measures such as VC dimensions and $ au_{ amespace{H}}(ullet)$. These results yield non-uniform, distribution-dependent generalization guarantees that can outperform traditional SRM in settings with favorable structure, and they provide concrete tools for incorporating prior knowledge without requiring exact parameter values. The approach offers a unifying compression-based perspective for non-uniform learning across diverse structured priors with practical implications for large-scale hypothesis classes.

Abstract

We address the general task of learning with a set of candidate models that is too large to have a uniform convergence of empirical estimates to true losses. While the common approach to such challenges is SRM (or regularization) based learning algorithms, we propose a novel learning paradigm that relies on stronger incorporation of empirical data and requires less algorithmic decisions to be based on prior assumptions. We analyze the generalization capabilities of our approach and demonstrate its merits in several common learning assumptions, including similarity of close points, clustering of the domain into highly label-homogeneous regions, Lipschitzness assumptions of the labeling rule, and contrastive learning assumptions. Our approach allows utilizing such assumptions without the need to know their true parameters a priori.

A Novel Data-Dependent Learning Paradigm for Large Hypothesis Classes

TL;DR

This work tackles learning when the candidate hypothesis set is too large for uniform convergence by introducing a data-dependent growth parameter that quantifies how many distinct behaviours a class induces on a sample. It replaces fixed-a-priori class weights (as in SRM) with compression-based, data-dependent guarantees that adapt to the effective complexity of the classes containing near-optimal hypotheses. The framework encompasses partial concepts and a variety of prior-knowledge forms, including hierarchical clustering, similarity graphs, nearest-neighbor priors, Lipschitz-real-valued functions, smoothness/margin conditions, and contrastive assumptions, delivering risk bounds that scale with data-driven complexity measures such as VC dimensions and . These results yield non-uniform, distribution-dependent generalization guarantees that can outperform traditional SRM in settings with favorable structure, and they provide concrete tools for incorporating prior knowledge without requiring exact parameter values. The approach offers a unifying compression-based perspective for non-uniform learning across diverse structured priors with practical implications for large-scale hypothesis classes.

Abstract

We address the general task of learning with a set of candidate models that is too large to have a uniform convergence of empirical estimates to true losses. While the common approach to such challenges is SRM (or regularization) based learning algorithms, we propose a novel learning paradigm that relies on stronger incorporation of empirical data and requires less algorithmic decisions to be based on prior assumptions. We analyze the generalization capabilities of our approach and demonstrate its merits in several common learning assumptions, including similarity of close points, clustering of the domain into highly label-homogeneous regions, Lipschitzness assumptions of the labeling rule, and contrastive learning assumptions. Our approach allows utilizing such assumptions without the need to know their true parameters a priori.

Paper Structure

This paper contains 16 sections, 24 theorems, 45 equations.

Key Result

Theorem 3

Let $\mathbb{H}$ be a collection of concept classes. There exists a learner $\mathcal{A}_{\mathbb{H}}: (\mathcal{X} \times \{0,1\})^* \times \mathcal{X} \rightarrow \{0,1\}$, with the following property: for every distribution $\mathcal{D}$, every $\delta \in (0,1)$ and $m \in \mathbb{N}$ we have wi

Theorems & Definitions (43)

  • Definition 1
  • Definition 2: Growth parameter $\tau_{\mathbb{H}}$
  • Theorem 3
  • Corollary 4
  • Theorem 5
  • Proposition 6
  • Claim 7
  • proof
  • Lemma 8
  • proof
  • ...and 33 more