Table of Contents
Fetching ...

Agnostic Sample Compression Schemes for Regression

Idan Attias, Steve Hanneke, Aryeh Kontorovich, Menachem Sadigurschi

TL;DR

The paper delivers the first positive results for bounded agnostic sample compression in regression under $\ell_p$ losses, introducing a boosting-based framework to obtain $\alpha$-approximate compressions for real-valued function classes with finite fat-shattering dimension, independently of the sample size $m$. It specializes the general approach to linear regression, yielding exact compressions of size $d+1$ for $\ell_1$ and $d+2$ for $\ell_\infty$, and an $\mathcal{O}(d\log(p/\alpha))$-sized approximate scheme for $p\in(1,\infty)$, while proving a lower bound ruling out constant-size exact compression for $p\in(1,\infty)$. The work connects to and extends prior results on compression in agnostic settings, including negative results for $\ell_2$ and realizable-regime gains via LAD and SVM-based reductions, and it raises open questions about compression sizes tied to pseudo-dimension and fat-shattering, potentially generalizing Warmuth’s compression conjecture to agnostic regression. Overall, these results illuminate the fundamental trade-offs between model complexity, loss functions, and compressibility in agnostic regression, with implications for generalization and algorithmic design.

Abstract

We obtain the first positive results for bounded sample compression in the agnostic regression setting with the $\ell_p$ loss, where $p\in [1,\infty]$. We construct a generic approximate sample compression scheme for real-valued function classes exhibiting exponential size in the fat-shattering dimension but independent of the sample size. Notably, for linear regression, an approximate compression of size linear in the dimension is constructed. Moreover, for $\ell_1$ and $\ell_\infty$ losses, we can even exhibit an efficient exact sample compression scheme of size linear in the dimension. We further show that for every other $\ell_p$ loss, $p\in (1,\infty)$, there does not exist an exact agnostic compression scheme of bounded size. This refines and generalizes a negative result of David, Moran, and Yehudayoff for the $\ell_2$ loss. We close by posing general open questions: for agnostic regression with $\ell_1$ loss, does every function class admits an exact compression scheme of size equal to its pseudo-dimension? For the $\ell_2$ loss, does every function class admit an approximate compression scheme of polynomial size in the fat-shattering dimension? These questions generalize Warmuth's classic sample compression conjecture for realizable-case classification.

Agnostic Sample Compression Schemes for Regression

TL;DR

The paper delivers the first positive results for bounded agnostic sample compression in regression under losses, introducing a boosting-based framework to obtain -approximate compressions for real-valued function classes with finite fat-shattering dimension, independently of the sample size . It specializes the general approach to linear regression, yielding exact compressions of size for and for , and an -sized approximate scheme for , while proving a lower bound ruling out constant-size exact compression for . The work connects to and extends prior results on compression in agnostic settings, including negative results for and realizable-regime gains via LAD and SVM-based reductions, and it raises open questions about compression sizes tied to pseudo-dimension and fat-shattering, potentially generalizing Warmuth’s compression conjecture to agnostic regression. Overall, these results illuminate the fundamental trade-offs between model complexity, loss functions, and compressibility in agnostic regression, with implications for generalization and algorithmic design.

Abstract

We obtain the first positive results for bounded sample compression in the agnostic regression setting with the loss, where . We construct a generic approximate sample compression scheme for real-valued function classes exhibiting exponential size in the fat-shattering dimension but independent of the sample size. Notably, for linear regression, an approximate compression of size linear in the dimension is constructed. Moreover, for and losses, we can even exhibit an efficient exact sample compression scheme of size linear in the dimension. We further show that for every other loss, , there does not exist an exact agnostic compression scheme of bounded size. This refines and generalizes a negative result of David, Moran, and Yehudayoff for the loss. We close by posing general open questions: for agnostic regression with loss, does every function class admits an exact compression scheme of size equal to its pseudo-dimension? For the loss, does every function class admit an approximate compression scheme of polynomial size in the fat-shattering dimension? These questions generalize Warmuth's classic sample compression conjecture for realizable-case classification.

Paper Structure

This paper contains 20 sections, 11 theorems, 64 equations, 1 figure, 1 table, 2 algorithms.

Key Result

Theorem 3

Let $\mathcal{F} \subseteq [0,1]^\mathcal{X}$, $S = \mathopen{}\left\{(x_i,y_i): i\in [m]\right\}\subseteq \mathcal{X} \times [0,1]$, an approximation parameter $\alpha\in [0,1]$, a weak learner parameter $\beta \in (0,1/2]$, and $\ell_p$ loss where $p\in [1,\infty]$. By setting alg:compression with we get an $\alpha$-approximate sample compression scheme of size for some universal constant $c>0$

Figures (1)

  • Figure 1: A sample $S$ of $m=20$ points $(x_i,y_i)$ was drawn iid uniformly from $[0,1]^2$. On this sample, $\ell_1$ regression was performed by solving the LP in (\ref{['eq:reg-lp1']}), shown on the left, and $\ell_\infty$ regression was performed by solving the LP in (\ref{['eq:reg-lpinf']}), on the right. In each case, the regressor provided by the LP solver is indicated by the thick (red) line. Notice that for $\ell_1$, the line contains exactly $2$ datapoints. For $\ell_\infty$, the regressor contains no datapoints; rather, the $d+2=3$ "support vectors" are indicated by .

Theorems & Definitions (20)

  • Remark 1
  • Remark 2
  • Theorem 3: Approximate compression for agnostic regression
  • Definition 4: Approximate weak real-valued learners
  • Corollary 5
  • Theorem 6: Approximate compression for agnostic linear regression
  • Proof
  • Theorem 7
  • Proof
  • Theorem 8
  • ...and 10 more