Sample Compression Scheme Reductions
Idan Attias, Steve Hanneke, Arvind Ramaswami
TL;DR
The paper develops a framework of reductions that transfer sample compression guarantees from binary classifiers to multiclass, regression, and adversarially robust learning. By relating the key complexity measures—graph dimension $d_ ext{G}$ and pseudo-dimension $d_ ext{P}$—to binary compression bounds $f(d_ ext{VC})$, it establishes that multiclass schemes can achieve sizes $O(f(d_ ext{G}))$ (often with a $ ext{log}| ext{Y}|$ factor), and regression schemes can achieve $oldsymbol{ε}$-approximate compressions of size $O(f(d_ ext{P}))$ up to log factors, under various reconstruction assumptions (majority vote, proper, stable). The adversarially robust setting is handled similarly by reducing to binary compression and yields $oldsymbol{O}(f(d_ ext{VC}) ext{log}M)$ bounds, with improvements under stability; a negative result shows that robustness can break the equivalence between learnability and bounded compression. The results illuminate how resolving the binary sample compression conjecture would cascade into broader learning settings and clarify the limits of compression-based generalization in robust contexts. Overall, the work provides a unified reduction toolkit and precise bounds linking binary compression to multiclass, regression, and robust learning, along with open questions about infinitized/inflated schemes and fat-shattering dimensions.
Abstract
We present novel reductions from sample compression schemes in multiclass classification, regression, and adversarially robust learning settings to binary sample compression schemes. Assuming we have a compression scheme for binary classes of size $f(d_\mathrm{VC})$, where $d_\mathrm{VC}$ is the VC dimension, then we have the following results: (1) If the binary compression scheme is a majority-vote or a stable compression scheme, then there exists a multiclass compression scheme of size $O(f(d_\mathrm{G}))$, where $d_\mathrm{G}$ is the graph dimension. Moreover, for general binary compression schemes, we obtain a compression of size $O(f(d_\mathrm{G})\log|Y|)$, where $Y$ is the label space. (2) If the binary compression scheme is a majority-vote or a stable compression scheme, then there exists an $ε$-approximate compression scheme for regression over $[0,1]$-valued functions of size $O(f(d_\mathrm{P}))$, where $d_\mathrm{P}$ is the pseudo-dimension. For general binary compression schemes, we obtain a compression of size $O(f(d_\mathrm{P})\log(1/ε))$. These results would have significant implications if the sample compression conjecture, which posits that any binary concept class with a finite VC dimension admits a binary compression scheme of size $O(d_\mathrm{VC})$, is resolved (Littlestone and Warmuth, 1986; Floyd and Warmuth, 1995; Warmuth, 2003). Our results would then extend the proof of the conjecture immediately to other settings. We establish similar results for adversarially robust learning and also provide an example of a concept class that is robustly learnable but has no bounded-size compression scheme, demonstrating that learnability is not equivalent to having a compression scheme independent of the sample size, unlike in binary classification, where compression of size $2^{O(d_\mathrm{VC})}$ is attainable (Moran and Yehudayoff, 2016).
