Table of Contents
Fetching ...

Sample compression schemes for balls in graphs

Jérémie Chalopin, Victor Chepoi, Fionn Mc Inerney, Sébastien Ratel, Yann Vaxès

TL;DR

The paper investigates sample compression schemes for the family of balls in graphs, linking the VC-dimension framework to concrete, graph-class-specific constructions. It delivers explicit, small-size proper labeled and unlabeled schemes across several graph families (trees, cycles, interval graphs, cube-free median graphs, split graphs, planar graphs, and hyperbolic graphs), including radius-restricted variants and approximate schemes in δ-hyperbolic settings. Key contributions include USCS/LSCS of size 2 for metric trees and size 3–6 for cycles and trees of cycles, 4 for interval graphs, 22 for cube-free median graphs, and 4 for radius-1 planar graphs, along with a general approximate framework for δ-hyperbolic graphs. These results advance understanding of the sample compression conjecture in a broad graph-theoretic context and offer practical compression schemes tailored to specific geometric structures, with several open questions on optimality and extensions to additional graph classes.

Abstract

One of the open problems in machine learning is whether any set-family of VC-dimension $d$ admits a sample compression scheme of size $O(d)$. In this paper, we study this problem for balls in graphs. For a ball $B=B_r(x)$ of a graph $G=(V,E)$, a realizable sample for $B$ is a signed subset $X=(X^+,X^-)$ of $V$ such that $B$ contains $X^+$ and is disjoint from $X^-$. A proper sample compression scheme of size $k$ consists of a compressor and a reconstructor. The compressor maps any realizable sample $X$ to a subsample $X'$ of size at most $k$. The reconstructor maps each such subsample $X'$ to a ball $B'$ of $G$ such that $B'$ includes $X^+$ and is disjoint from $X^-$. For balls of arbitrary radius $r$, we design proper labeled sample compression schemes of size $2$ for trees, of size $3$ for cycles, of size $4$ for interval graphs, of size $6$ for trees of cycles, and of size $22$ for cube-free median graphs. For balls of a given radius, we design proper labeled sample compression schemes of size $2$ for trees and of size $4$ for interval graphs. We also design approximate sample compression schemes of size 2 for balls of $δ$-hyperbolic graphs.

Sample compression schemes for balls in graphs

TL;DR

The paper investigates sample compression schemes for the family of balls in graphs, linking the VC-dimension framework to concrete, graph-class-specific constructions. It delivers explicit, small-size proper labeled and unlabeled schemes across several graph families (trees, cycles, interval graphs, cube-free median graphs, split graphs, planar graphs, and hyperbolic graphs), including radius-restricted variants and approximate schemes in δ-hyperbolic settings. Key contributions include USCS/LSCS of size 2 for metric trees and size 3–6 for cycles and trees of cycles, 4 for interval graphs, 22 for cube-free median graphs, and 4 for radius-1 planar graphs, along with a general approximate framework for δ-hyperbolic graphs. These results advance understanding of the sample compression conjecture in a broad graph-theoretic context and offer practical compression schemes tailored to specific geometric structures, with several open questions on optimality and extensions to additional graph classes.

Abstract

One of the open problems in machine learning is whether any set-family of VC-dimension admits a sample compression scheme of size . In this paper, we study this problem for balls in graphs. For a ball of a graph , a realizable sample for is a signed subset of such that contains and is disjoint from . A proper sample compression scheme of size consists of a compressor and a reconstructor. The compressor maps any realizable sample to a subsample of size at most . The reconstructor maps each such subsample to a ball of such that includes and is disjoint from . For balls of arbitrary radius , we design proper labeled sample compression schemes of size for trees, of size for cycles, of size for interval graphs, of size for trees of cycles, and of size for cube-free median graphs. For balls of a given radius, we design proper labeled sample compression schemes of size for trees and of size for interval graphs. We also design approximate sample compression schemes of size 2 for balls of -hyperbolic graphs.
Paper Structure (14 sections, 27 theorems, 5 equations, 8 figures)

This paper contains 14 sections, 27 theorems, 5 equations, 8 figures.

Key Result

Proposition 1

For any tree $T=(V,E)$, the pair $(\alpha,\beta)$ of maps defines a proper unlabeled sample compression scheme of size 2 for $\mathop{\mathrm{\mathcal{B}}}\nolimits(T)$.

Figures (8)

  • Figure 1: On the left is an illustration of the vertices $\phi_s^+(v)$ and $\phi_s^-(v)$ associated to a vertex $v \in V(T)$ with respect to a vertex $s \in X^-$. On the right is an example of the possible center designators of a vertex $s \in X^-$. The vertices $t \in X^+$ such that the $r$-ball centered at $\phi_s^+(t)$ realizes $X$ are in blue. The vertices $t \in X^-$ such that the $r$-ball centered at $\phi_s^-(t)$ realizes $X$ are in red.
  • Figure 2: The vertices and sets used in the proper labeled sample compression scheme for trees of cycles. The ball $B_r(x)$ is represented in red. The cycles outside $C(u^+,v^+)$ are represented as paths.
  • Figure 3: Definition and positioning of $s$, $t$, and $z$ in the four cases of Lemma \ref{['lem:r_y^*_and_r_x^*']}.
  • Figure 4: Illustration of the graph $G$ in Example \ref{['ex:cube-free_med_VCdim']} that is used to show that the balls of cube-free median graphs have VC-dimension at least $4$. The set of $4$ vertices in red with labels $1,2,3,4$ are shattered by $\mathop{\mathrm{\mathcal{B}}}\nolimits(G)$.
  • Figure 5: On the left, the region $\mathbf R$ and the halfstrips $\mathbf S_1'(x)$, $\mathbf S_2"(x)$, $\mathbf S_3'(x)$, and $\mathbf S_4"(x)$. On the right, the regions $\mathbf R$, $\mathbf R'$, and $\mathbf R"$ computed from $\alpha(X)$. Steps 1-4 of the reconstruction correspond to the black, green, blue, and red parts of the figure. The target center $x$ is given in gray.
  • ...and 3 more figures

Theorems & Definitions (62)

  • Proposition 1
  • proof
  • Claim 2
  • proof
  • Proposition 3
  • proof
  • Lemma 4
  • proof
  • Lemma 5
  • proof
  • ...and 52 more