Table of Contents
Fetching ...

Data-dependent Generalization Bounds via Variable-Size Compressibility

Milad Sefidgaran, Abdellatif Zaidi

TL;DR

The work tackles data-dependent generalization bounds by introducing a variable-size compressibility framework that ties the generalization error to a data-dependent compression rate of the input. It develops generic tail, expectation, and in-expectation bounds and demonstrates how these bounds subsume and extend PAC-Bayes and intrinsic-dimension bounds, while providing new results such as lossy PAC-Bayes bounds and a trajectory-based dimension bound via rate-distortion of optimization paths. By unifying rate-distortion, PAC-Bayes, and dimension-based approaches, the framework offers practical data-dependent guarantees and insights into the role of trajectory compressibility in generalization. The experimental validation on CIFAR-10 corroborates the theoretical connections between optimization trajectory compressibility and generalization performance, supporting the practical relevance of the proposed bounds and framework.

Abstract

In this paper, we establish novel data-dependent upper bounds on the generalization error through the lens of a "variable-size compressibility" framework that we introduce newly here. In this framework, the generalization error of an algorithm is linked to a variable-size 'compression rate' of its input data. This is shown to yield bounds that depend on the empirical measure of the given input data at hand, rather than its unknown distribution. Our new generalization bounds that we establish are tail bounds, tail bounds on the expectation, and in-expectations bounds. Moreover, it is shown that our framework also allows to derive general bounds on any function of the input data and output hypothesis random variables. In particular, these general bounds are shown to subsume and possibly improve over several existing PAC-Bayes and data-dependent intrinsic dimension-based bounds that are recovered as special cases, thus unveiling a unifying character of our approach. For instance, a new data-dependent intrinsic dimension-based bound is established, which connects the generalization error to the optimization trajectories and reveals various interesting connections with the rate-distortion dimension of a process, the Rényi information dimension of a process, and the metric mean dimension.

Data-dependent Generalization Bounds via Variable-Size Compressibility

TL;DR

The work tackles data-dependent generalization bounds by introducing a variable-size compressibility framework that ties the generalization error to a data-dependent compression rate of the input. It develops generic tail, expectation, and in-expectation bounds and demonstrates how these bounds subsume and extend PAC-Bayes and intrinsic-dimension bounds, while providing new results such as lossy PAC-Bayes bounds and a trajectory-based dimension bound via rate-distortion of optimization paths. By unifying rate-distortion, PAC-Bayes, and dimension-based approaches, the framework offers practical data-dependent guarantees and insights into the role of trajectory compressibility in generalization. The experimental validation on CIFAR-10 corroborates the theoretical connections between optimization trajectory compressibility and generalization performance, supporting the practical relevance of the proposed bounds and framework.

Abstract

In this paper, we establish novel data-dependent upper bounds on the generalization error through the lens of a "variable-size compressibility" framework that we introduce newly here. In this framework, the generalization error of an algorithm is linked to a variable-size 'compression rate' of its input data. This is shown to yield bounds that depend on the empirical measure of the given input data at hand, rather than its unknown distribution. Our new generalization bounds that we establish are tail bounds, tail bounds on the expectation, and in-expectations bounds. Moreover, it is shown that our framework also allows to derive general bounds on any function of the input data and output hypothesis random variables. In particular, these general bounds are shown to subsume and possibly improve over several existing PAC-Bayes and data-dependent intrinsic dimension-based bounds that are recovered as special cases, thus unveiling a unifying character of our approach. For instance, a new data-dependent intrinsic dimension-based bound is established, which connects the generalization error to the optimization trajectories and reveals various interesting connections with the rate-distortion dimension of a process, the Rényi information dimension of a process, and the metric mean dimension.
Paper Structure (37 sections, 13 theorems, 107 equations, 1 figure)

This paper contains 37 sections, 13 theorems, 107 equations, 1 figure.

Key Result

Theorem 1

If the algorithm $\mathcal{A}$ is $(R_{S,W}{,}\epsilon{,}\delta;d_m)$-compressible and $\forall w{\in} \mathcal{W}$, $\ell(Z,w)$ is $\sigma$-subgaussian, then with probability at least $(1-\delta)$,

Figures (1)

  • Figure 1: Estimated generalization error and (an approximation of) the bound of Theorem \ref{['th:RDProcess']} computed for FCN4 and CNN4, both trained on the CIFAR10 dataset for various learning rates. The values of the estimated generalization error are plotted relative to the left hand sided Y-axis; and the values of the approximation of the bound of Theorem \ref{['th:RDProcess']} are plotted relative to the right hand sided Y-axis.

Theorems & Definitions (15)

  • Definition 1: Sefidgaran2022
  • Definition 2: Variable-size compressibility
  • Theorem 1
  • Theorem 2
  • Theorem 3
  • Theorem 4
  • Theorem 5
  • Proposition 1
  • Theorem 6
  • Theorem 7
  • ...and 5 more