Table of Contents
Fetching ...

Optimal Learners for Realizable Regression: PAC Learning and Online Learning

Idan Attias, Steve Hanneke, Alkis Kalavasis, Amin Karbasi, Grigoris Velegkas

TL;DR

The paper tackles the problem of characterizing learnability for realizable real-valued regression under the absolute loss and designs minimax-optimal learners for both PAC and online settings. It introduces a suite of gamma-parameterized combinatorial dimensions (gamma-Graph, gamma-OIG, gamma-DS) that precisely delineate when PAC learnability is possible and, in the online setting, proves a tight characterization via scaled Littlestone trees. Notable contributions include a worst-case ERM learner that is minimax-optimal when the gamma-Graph dimension is finite, a corresponding OIG-based learner whose performance is governed by the gamma-OIG dimension, and a near-tight online algorithm achieving cumulative loss bounded by the online dimension up to a factor of 2. The work also clarifies the roles and limits of existing notions like fat shattering and Natarajan dimensions in realizable regression and lays out a conjecture linking finite gamma-DS dimension to sufficiency, offering a path for future theoretical refinement with practical implications for designing optimal regression learners in adversarial and streaming contexts.

Abstract

In this work, we aim to characterize the statistical complexity of realizable regression both in the PAC learning setting and the online learning setting. Previous work had established the sufficiency of finiteness of the fat shattering dimension for PAC learnability and the necessity of finiteness of the scaled Natarajan dimension, but little progress had been made towards a more complete characterization since the work of Simon (SICOMP '97). To this end, we first introduce a minimax instance optimal learner for realizable regression and propose a novel dimension that both qualitatively and quantitatively characterizes which classes of real-valued predictors are learnable. We then identify a combinatorial dimension related to the Graph dimension that characterizes ERM learnability in the realizable setting. Finally, we establish a necessary condition for learnability based on a combinatorial dimension related to the DS dimension, and conjecture that it may also be sufficient in this context. Additionally, in the context of online learning we provide a dimension that characterizes the minimax instance optimal cumulative loss up to a constant factor and design an optimal online learner for realizable regression, thus resolving an open question raised by Daskalakis and Golowich in STOC '22.

Optimal Learners for Realizable Regression: PAC Learning and Online Learning

TL;DR

The paper tackles the problem of characterizing learnability for realizable real-valued regression under the absolute loss and designs minimax-optimal learners for both PAC and online settings. It introduces a suite of gamma-parameterized combinatorial dimensions (gamma-Graph, gamma-OIG, gamma-DS) that precisely delineate when PAC learnability is possible and, in the online setting, proves a tight characterization via scaled Littlestone trees. Notable contributions include a worst-case ERM learner that is minimax-optimal when the gamma-Graph dimension is finite, a corresponding OIG-based learner whose performance is governed by the gamma-OIG dimension, and a near-tight online algorithm achieving cumulative loss bounded by the online dimension up to a factor of 2. The work also clarifies the roles and limits of existing notions like fat shattering and Natarajan dimensions in realizable regression and lays out a conjecture linking finite gamma-DS dimension to sufficiency, offering a path for future theoretical refinement with practical implications for designing optimal regression learners in adversarial and streaming contexts.

Abstract

In this work, we aim to characterize the statistical complexity of realizable regression both in the PAC learning setting and the online learning setting. Previous work had established the sufficiency of finiteness of the fat shattering dimension for PAC learnability and the necessity of finiteness of the scaled Natarajan dimension, but little progress had been made towards a more complete characterization since the work of Simon (SICOMP '97). To this end, we first introduce a minimax instance optimal learner for realizable regression and propose a novel dimension that both qualitatively and quantitatively characterizes which classes of real-valued predictors are learnable. We then identify a combinatorial dimension related to the Graph dimension that characterizes ERM learnability in the realizable setting. Finally, we establish a necessary condition for learnability based on a combinatorial dimension related to the DS dimension, and conjecture that it may also be sufficient in this context. Additionally, in the context of online learning we provide a dimension that characterizes the minimax instance optimal cumulative loss up to a constant factor and design an optimal online learner for realizable regression, thus resolving an open question raised by Daskalakis and Golowich in STOC '22.
Paper Structure (37 sections, 18 theorems, 95 equations, 1 figure, 3 algorithms)

This paper contains 37 sections, 18 theorems, 95 equations, 1 figure, 3 algorithms.

Key Result

Lemma 1

For every $\varepsilon, \delta \in (0,1)^2$ and every $\mathcal{H} \subseteq [0,1]^\mathcal{X}$, where $\mathcal{X}$ is the input domain, it holds that

Figures (1)

  • Figure 1: Landscape of Realizable PAC Regression: the "deleted" arrows mean that the implication is not true. The equivalence between finite fat-shattering dimension and the uniform convergence property is known even in the realizable case (see shalev2010learnability) and the fact that PAC learnability requires finite scaled Natarajan dimension is proved in simon1997bounds. The properties of the other three dimensions (scaled Graph dimension, scaled One-Inclusion-Graph (OIG) dimension, and scaled Daniely-Shalev Shwartz (DS) dimension) are shown in this work. We further conjecture that finite scaled Natajaran dimension is not sufficient for PAC learning, while finite scaled DS does suffice. Interestingly, we observe that the notions of uniform convergence, learnability by any ERM and PAC learnability are separated in realizable regression.

Theorems & Definitions (64)

  • Definition 1: PAC Realizable Regression
  • Definition 2: Online Realizable Regression
  • Definition 3: Projection of $\mathcal{H}$ to $S$
  • Definition 4: $\gamma$-Fat Shattering Dimension kearns1994efficient
  • Example 1: Realizable Learnability $\nRightarrow$ Finite Fat-Shattering Dimension, see Section 6 in bartlett1994fat
  • Definition 5: $\gamma$-Natarajan Dimension simon1997bounds
  • Definition 6: $\gamma$-Graph Dimension
  • Definition 7: One-Inclusion Hypergraph haussler1994predictingrubinstein2009shiftingbrukhim2022characterization
  • Definition 8: Orientation and Scaled Out-Degree
  • Definition 9: $\gamma$-OIG Dimension
  • ...and 54 more