Dual VC Dimension Obstructs Sample Compression by Embeddings
Zachary Chase, Bogdan Chornomaz, Steve Hanneke, Shay Moran, Amir Yehudayoff
TL;DR
The paper investigates how much VC dimension must increase when embedding a VC class into an extremal class, showing that in general the increase is at least exponential in the original VC dimension and that a universal embedding achieving only linear or polynomial growth is impossible. It introduces the dual VC dimension and proves a sharp upper bound for extremal classes: vc*(C) ≤ 2 vc(C) + 1, contrasting with Assouad's general bound. A unified abstract-convexity framework is developed via Radon numbers, establishing that for extremal C the dual Radon number satisfies r(C) ≤ 2 vc(C) + 1 and linking to the topological Radon theorem through a cube complex Q(C). These results imply fundamental limitations of Floyd and Warmuth’s sample compression program and provide a topological route to understanding the trade-offs between VC, dual VC, and dual Radon parameters in extremal classes.
Abstract
This work studies embedding of arbitrary VC classes in well-behaved VC classes, focusing particularly on extremal classes. Our main result expresses an impossibility: such embeddings necessarily require a significant increase in dimension. In particular, we prove that for every $d$ there is a class with VC dimension $d$ that cannot be embedded in any extremal class of VC dimension smaller than exponential in $d$. In addition to its independent interest, this result has an important implication in learning theory, as it reveals a fundamental limitation of one of the most extensively studied approaches to tackling the long-standing sample compression conjecture. Concretely, the approach proposed by Floyd and Warmuth entails embedding any given VC class into an extremal class of a comparable dimension, and then applying an optimal sample compression scheme for extremal classes. However, our results imply that this strategy would in some cases result in a sample compression scheme at least exponentially larger than what is predicted by the sample compression conjecture. The above implications follow from a general result we prove: any extremal class with VC dimension $d$ has dual VC dimension at most $2d+1$. This bound is exponentially smaller than the classical bound $2^{d+1}-1$ of Assouad, which applies to general concept classes (and is known to be unimprovable for some classes). We in fact prove a stronger result, establishing that $2d+1$ upper bounds the dual Radon number of extremal classes. This theorem represents an abstraction of the classical Radon theorem for convex sets, extending its applicability to a wider combinatorial framework, without relying on the specifics of Euclidean convexity. The proof utilizes the topological method and is primarily based on variants of the Topological Radon Theorem.
