Table of Contents
Fetching ...

Online Realizable Regression and Applications for ReLU Networks

Ilan Doron-Arad, Idan Mehalel, Elchanan Mossel

TL;DR

A generic potential method that upper bounds realizable online regression in the adversarial model under losses that satisfy an approximate triangle inequality (approximate pseudo-metrics) and proves a sharp $q-vs-d$ dichotomy for realizable online learning.

Abstract

Realizable online regression can behave very differently from online classification. Even without any margin or stochastic assumptions, realizability may enforce horizon-free (finite) cumulative loss under metric-like losses, even when the analogous classification problem has an infinite mistake bound. We study realizable online regression in the adversarial model under losses that satisfy an approximate triangle inequality (approximate pseudo-metrics). Recent work of Attias et al. shows that the minimax realizable cumulative loss is characterized by the scaled Littlestone/online dimension $\mathbb{D}_{\mathrm{onl}}$, but this quantity can be difficult to analyze. Our main contribution is a generic potential method that upper bounds $\mathbb{D}_{\mathrm{onl}}$ by a concrete Dudley-type entropy integral that depends only on covering numbers of the hypothesis class under the induced sup pseudo-metric. We define an \emph{entropy potential} $Φ(\mathcal{H})=\int_{0}^{diam(\mathcal{H})} \log N(\mathcal{H},\varepsilon)\,d\varepsilon$, where $N(\mathcal{H},\varepsilon)$ is the $\varepsilon$-covering number of $\mathcal{H}$, and show that for every $c$-approximate pseudo-metric loss, $\mathbb{D}_{\mathrm{onl}}(\mathcal{H})\le O(c)\,Φ(\mathcal{H})$. In particular, polynomial metric entropy implies $Φ(\mathcal{H})<\infty$ and hence a horizon-free realizable cumulative-loss bound with transparent dependence on effective dimension. We illustrate the method on two families. We prove a sharp $q$-vs.-$d$ dichotomy for realizable online learning (finite and efficiently achievable $Θ_{d,q}(L^d)$ total loss for $L$-Lipschitz regression iff $q>d$, otherwise infinite), and for bounded-norm $k$-ReLU networks separate regression (finite loss, even $\widetilde O(k^2)$, and $O(1)$ for one ReLU) from classification (impossible already for $k=2,d=1$).

Online Realizable Regression and Applications for ReLU Networks

TL;DR

A generic potential method that upper bounds realizable online regression in the adversarial model under losses that satisfy an approximate triangle inequality (approximate pseudo-metrics) and proves a sharp dichotomy for realizable online learning.

Abstract

Realizable online regression can behave very differently from online classification. Even without any margin or stochastic assumptions, realizability may enforce horizon-free (finite) cumulative loss under metric-like losses, even when the analogous classification problem has an infinite mistake bound. We study realizable online regression in the adversarial model under losses that satisfy an approximate triangle inequality (approximate pseudo-metrics). Recent work of Attias et al. shows that the minimax realizable cumulative loss is characterized by the scaled Littlestone/online dimension , but this quantity can be difficult to analyze. Our main contribution is a generic potential method that upper bounds by a concrete Dudley-type entropy integral that depends only on covering numbers of the hypothesis class under the induced sup pseudo-metric. We define an \emph{entropy potential} , where is the -covering number of , and show that for every -approximate pseudo-metric loss, . In particular, polynomial metric entropy implies and hence a horizon-free realizable cumulative-loss bound with transparent dependence on effective dimension. We illustrate the method on two families. We prove a sharp -vs.- dichotomy for realizable online learning (finite and efficiently achievable total loss for -Lipschitz regression iff , otherwise infinite), and for bounded-norm -ReLU networks separate regression (finite loss, even , and for one ReLU) from classification (impossible already for ).
Paper Structure (65 sections, 48 theorems, 224 equations, 2 figures, 1 table)

This paper contains 65 sections, 48 theorems, 224 equations, 2 figures, 1 table.

Key Result

Theorem 1.1

Assume $\ell$ is a $c$-approximate pseudo-metric for some $c\ge 1$ and $\operatorname{diam}(\mathcal{H})<\infty$. Then

Figures (2)

  • Figure 1: Lipschitz realizable regression under $\ell_q(y,y')=|y-y'|^q$ loss: Divergence for $q\leq d$ and finiteness for $q>d$.
  • Figure 2: Illustration of the inequality $N(U,\varepsilon)\ \ge\ N(U_0,\varepsilon)+N(U_1,\varepsilon)$. Below the gap scale, no $\varepsilon$-ball can hit both children, so any $\varepsilon$-cover splits into disjoint covers of $U_0$ and $U_1$.

Theorems & Definitions (101)

  • Theorem 1.1: Online dimension via entropy potential
  • Corollary 1.2
  • Theorem 1.3
  • Corollary 1.4
  • Theorem 1.5
  • Theorem 1.6
  • Theorem 1.7
  • Theorem 1.8: Proper online hardness
  • Proposition 1.9: $\Phi(\mathcal{H})=\infty$ while $\mathbb{D}_{\mathrm{onl}}(\mathcal{H})<\infty$
  • Definition 2.1: $c$-approximate pseudo-metric
  • ...and 91 more