Efficient Agnostic Learning with Average Smoothness

Steve Hanneke; Aryeh Kontorovich; Guy Kornowski

Efficient Agnostic Learning with Average Smoothness

Steve Hanneke, Aryeh Kontorovich, Guy Kornowski

TL;DR

This work addresses distribution-free agnostic regression under the average-smoothness framework by deriving a uniform convergence bound via bracketing entropy and by presenting a polynomial-time agnostic learning algorithm. The key ideas show that the generalization gap for average-smooth function classes can be controlled without distributional assumptions, using a bound that depends on bracketing entropy and the intrinsic geometry of the space. The main contributions are a bracketing-based uniform convergence theorem and an efficient agnostic learner whose sample complexity matches existing exponential-time guarantees, with preprocessing and inference times that scale favorably. The results extend prior realizable-learning guarantees to the agnostic setting for totally bounded metric spaces, and in doubling spaces they yield concrete rates such as $\widetilde{O}\left( \frac{H^{d/(d+2\beta)}}{n^{\beta/(d+2\beta)}} \right)$, highlighting both theoretical and practical significance.

Abstract

We study distribution-free nonparametric regression following a notion of average smoothness initiated by Ashlagi et al. (2021), which measures the "effective" smoothness of a function with respect to an arbitrary unknown underlying distribution. While the recent work of Hanneke et al. (2023) established tight uniform convergence bounds for average-smooth functions in the realizable case and provided a computationally efficient realizable learning algorithm, both of these results currently lack analogs in the general agnostic (i.e. noisy) case. In this work, we fully close these gaps. First, we provide a distribution-free uniform convergence bound for average-smoothness classes in the agnostic setting. Second, we match the derived sample complexity with a computationally efficient agnostic learning algorithm. Our results, which are stated in terms of the intrinsic geometry of the data and hold over any totally bounded metric space, show that the guarantees recently obtained for realizable learning of average-smooth functions transfer to the agnostic setting. At the heart of our proof, we establish the uniform convergence rate of a function class in terms of its bracketing entropy, which may be of independent interest.

Efficient Agnostic Learning with Average Smoothness

TL;DR

, highlighting both theoretical and practical significance.

Abstract

Paper Structure (13 sections, 6 theorems, 45 equations, 1 figure, 1 algorithm)

This paper contains 13 sections, 6 theorems, 45 equations, 1 figure, 1 algorithm.

Introduction
Our Contributions.
Preliminaries
Setting.
Metric notions.
Bracketing.
Average smoothness ashlagi2021functionshanneke2023near.
Generalization bounds
Efficient agnostic learning algorithm
Proofs
Proof of Theorem \ref{['thm: bracket to uc']}
Proof of Theorem \ref{['thm: alg']}
Acknowledgments.

Key Result

Theorem 1

For any metric probability space $(\Omega,\rho,\mu)$, any $\beta\in(0,1]$ and any $0<\varepsilon<H:$

Figures (1)

Figure 1: Illustration of a function and a measure $\mu$ exhibiting a large gap between "worst-case" smoothness (occurring in low density regions) and average-smoothness with respect to $\mu$; figure taken from hanneke2023near.

Theorems & Definitions (12)

Remark 1: Covering vs. bracketing
Theorem 1: hanneke2023near, Theorem 1
Remark 2: Weak average
Theorem 2
Remark 3: Other losses
Theorem 3
Remark 4: Doubling metrics
Theorem 4
Remark 5: Doubling metrics
Remark 6: Computational complexity
...and 2 more

Efficient Agnostic Learning with Average Smoothness

TL;DR

Abstract

Efficient Agnostic Learning with Average Smoothness

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (12)