Online Realizable Regression and Applications for ReLU Networks

Ilan Doron-Arad; Idan Mehalel; Elchanan Mossel

Online Realizable Regression and Applications for ReLU Networks

Ilan Doron-Arad, Idan Mehalel, Elchanan Mossel

TL;DR

A generic potential method that upper bounds realizable online regression in the adversarial model under losses that satisfy an approximate triangle inequality (approximate pseudo-metrics) and proves a sharp $q-vs-d$ dichotomy for realizable online learning.

Abstract

Realizable online regression can behave very differently from online classification. Even without any margin or stochastic assumptions, realizability may enforce horizon-free (finite) cumulative loss under metric-like losses, even when the analogous classification problem has an infinite mistake bound. We study realizable online regression in the adversarial model under losses that satisfy an approximate triangle inequality (approximate pseudo-metrics). Recent work of Attias et al. shows that the minimax realizable cumulative loss is characterized by the scaled Littlestone/online dimension $\mathbb{D}_{\mathrm{onl}}$, but this quantity can be difficult to analyze. Our main contribution is a generic potential method that upper bounds $\mathbb{D}_{\mathrm{onl}}$ by a concrete Dudley-type entropy integral that depends only on covering numbers of the hypothesis class under the induced sup pseudo-metric. We define an \emph{entropy potential} $Φ(\mathcal{H})=\int_{0}^{diam(\mathcal{H})} \log N(\mathcal{H},\varepsilon)\,d\varepsilon$, where $N(\mathcal{H},\varepsilon)$ is the $\varepsilon$-covering number of $\mathcal{H}$, and show that for every $c$-approximate pseudo-metric loss, $\mathbb{D}_{\mathrm{onl}}(\mathcal{H})\le O(c)\,Φ(\mathcal{H})$. In particular, polynomial metric entropy implies $Φ(\mathcal{H})<\infty$ and hence a horizon-free realizable cumulative-loss bound with transparent dependence on effective dimension. We illustrate the method on two families. We prove a sharp $q$-vs.-$d$ dichotomy for realizable online learning (finite and efficiently achievable $Θ_{d,q}(L^d)$ total loss for $L$-Lipschitz regression iff $q>d$, otherwise infinite), and for bounded-norm $k$-ReLU networks separate regression (finite loss, even $\widetilde O(k^2)$, and $O(1)$ for one ReLU) from classification (impossible already for $k=2,d=1$).

Online Realizable Regression and Applications for ReLU Networks

TL;DR

dichotomy for realizable online learning.

Abstract

, but this quantity can be difficult to analyze. Our main contribution is a generic potential method that upper bounds

by a concrete Dudley-type entropy integral that depends only on covering numbers of the hypothesis class under the induced sup pseudo-metric. We define an \emph{entropy potential}

, where

is the

-covering number of

, and show that for every

-approximate pseudo-metric loss,

. In particular, polynomial metric entropy implies

and hence a horizon-free realizable cumulative-loss bound with transparent dependence on effective dimension. We illustrate the method on two families. We prove a sharp

-vs.-

dichotomy for realizable online learning (finite and efficiently achievable

total loss for

-Lipschitz regression iff

, otherwise infinite), and for bounded-norm

-ReLU networks separate regression (finite loss, even

, and

for one ReLU) from classification (impossible already for

Paper Structure (65 sections, 48 theorems, 224 equations, 2 figures, 1 table)

This paper contains 65 sections, 48 theorems, 224 equations, 2 figures, 1 table.

Introduction
Our Results
Related Work
Discussion
When the entropy potential diverges.
Open directions.
Improving (or justifying) the $\tilde{O}\left(k^2\right)$ for bounded $k$-ReLU.
Near-realizable sequences.
Efficiency.
Organization.
Preliminaries
Why an (approximate) triangle inequality is necessary.
A Potential Upper bound for Online Realizable Regression
Scaled Littlestone trees and a branch bound
Lipschitz Regression under $\ell_q$ Loss
...and 50 more sections

Key Result

Theorem 1.1

Assume $\ell$ is a $c$-approximate pseudo-metric for some $c\ge 1$ and $\operatorname{diam}(\mathcal{H})<\infty$. Then

Figures (2)

Figure 1: Lipschitz realizable regression under $\ell_q(y,y')=|y-y'|^q$ loss: Divergence for $q\leq d$ and finiteness for $q>d$.
Figure 2: Illustration of the inequality $N(U,\varepsilon)\ \ge\ N(U_0,\varepsilon)+N(U_1,\varepsilon)$. Below the gap scale, no $\varepsilon$-ball can hit both children, so any $\varepsilon$-cover splits into disjoint covers of $U_0$ and $U_1$.

Theorems & Definitions (101)

Theorem 1.1: Online dimension via entropy potential
Corollary 1.2
Theorem 1.3
Corollary 1.4
Theorem 1.5
Theorem 1.6
Theorem 1.7
Theorem 1.8: Proper online hardness
Proposition 1.9: $\Phi(\mathcal{H})=\infty$ while $\mathbb{D}_{\mathrm{onl}}(\mathcal{H})<\infty$
Definition 2.1: $c$-approximate pseudo-metric
...and 91 more

Online Realizable Regression and Applications for ReLU Networks

TL;DR

Abstract

Online Realizable Regression and Applications for ReLU Networks

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (101)