Table of Contents
Fetching ...

Class-wise Generalization Error: an Information-Theoretic Analysis

Firas Laakom, Yuheng Bu, Moncef Gabbouj

TL;DR

This work identifies and formalizes class-wise generalization error, demonstrating that standard average generalization bounds obscure per-class disparities observed in real neural networks. It develops a hierarchy of information-theoretic bounds—starting with a KL-based MI bound and extending to class-CMI, class-f-CMI, and class-Δ_y L-CMI—that quantify per-class generalization in a way that is practical to estimate. Empirical results on CIFAR-10 (and its noisy variant) show these bounds track the complex, class-dependent generalization behavior, offering predictive insight into which classes generalize poorly. The framework also extends to related settings, including standard generalization, sub-task problems under distribution shift, and fairness certificates with sensitive attributes, highlighting broad applicability. Overall, the paper advances understanding of how class-specific factors, learning dynamics, and data distributions interact to shape generalization in deep learning, with potential impact on targeted data augmentation and fairness-aware training strategies.

Abstract

Existing generalization theories of supervised learning typically take a holistic approach and provide bounds for the expected generalization over the whole data distribution, which implicitly assumes that the model generalizes similarly for all the classes. In practice, however, there are significant variations in generalization performance among different classes, which cannot be captured by the existing generalization bounds. In this work, we tackle this problem by theoretically studying the class-generalization error, which quantifies the generalization performance of each individual class. We derive a novel information-theoretic bound for class-generalization error using the KL divergence, and we further obtain several tighter bounds using the conditional mutual information (CMI), which are significantly easier to estimate in practice. We empirically validate our proposed bounds in different neural networks and show that they accurately capture the complex class-generalization error behavior. Moreover, we show that the theoretical tools developed in this paper can be applied in several applications beyond this context.

Class-wise Generalization Error: an Information-Theoretic Analysis

TL;DR

This work identifies and formalizes class-wise generalization error, demonstrating that standard average generalization bounds obscure per-class disparities observed in real neural networks. It develops a hierarchy of information-theoretic bounds—starting with a KL-based MI bound and extending to class-CMI, class-f-CMI, and class-Δ_y L-CMI—that quantify per-class generalization in a way that is practical to estimate. Empirical results on CIFAR-10 (and its noisy variant) show these bounds track the complex, class-dependent generalization behavior, offering predictive insight into which classes generalize poorly. The framework also extends to related settings, including standard generalization, sub-task problems under distribution shift, and fairness certificates with sensitive attributes, highlighting broad applicability. Overall, the paper advances understanding of how class-specific factors, learning dynamics, and data distributions interact to shape generalization in deep learning, with potential impact on targeted data augmentation and fairness-aware training strategies.

Abstract

Existing generalization theories of supervised learning typically take a holistic approach and provide bounds for the expected generalization over the whole data distribution, which implicitly assumes that the model generalizes similarly for all the classes. In practice, however, there are significant variations in generalization performance among different classes, which cannot be captured by the existing generalization bounds. In this work, we tackle this problem by theoretically studying the class-generalization error, which quantifies the generalization performance of each individual class. We derive a novel information-theoretic bound for class-generalization error using the KL divergence, and we further obtain several tighter bounds using the conditional mutual information (CMI), which are significantly easier to estimate in practice. We empirically validate our proposed bounds in different neural networks and show that they accurately capture the complex class-generalization error behavior. Moreover, we show that the theoretical tools developed in this paper can be applied in several applications beyond this context.
Paper Structure (31 sections, 18 theorems, 84 equations, 7 figures)

This paper contains 31 sections, 18 theorems, 84 equations, 7 figures.

Key Result

Lemma 1

Under Assumption fixed_joint, the class-generalization error in definition class_gen is given by

Figures (7)

  • Figure 1: Left: The standard generalization error, i.e., test loss - train loss, and the generalization errors for several classes on CIFAR10 as a function of number of training samples. Right: The standard generalization error, bound proposed by harutyunyan2021information, and the generalization errors for several classes on noisy CIFAR10. Experimental details are available in Section \ref{['num_results']}.
  • Figure 2: Experimental results of class-generalization error and our bounds in Theorems \ref{['fCMI_class_bound']} and \ref{['deltaLCMI_class_bound']} for the class of "trucks" (left) and "cats" (middle) in CIFAR10 (top) and noisy CIFAR10 (bottom), as we increase the number of training samples. In the right column, we provide the scatter plots between the bound in Theorem \ref{['deltaLCMI_class_bound']} and the class-generalization error of the different classes for CIFAR10 (top) and noisy CIFAR10 (bottom).
  • Figure 3: The standard generalization error and the generalization error relative for all classes on CIFAR10 as a function of the number of training data.
  • Figure 4: Class-wise generalization on the 10 classes of CIFAR10 and the scatter plot between class-generalization error and the class-$f$-CMI bound in Theorem \ref{['fCMI_class_bound']}.
  • Figure 5: Class-wise generalization on the 10 classes of noisy CIFAR10 (clean validation) and the scatter plot between class-generalization error and the class-$f$-CMI bound in Theorem \ref{['fCMI_class_bound']}.
  • ...and 2 more figures

Theorems & Definitions (32)

  • Definition 1
  • Lemma 1
  • Theorem 1
  • Definition 2
  • Theorem 2: class-CMI
  • Theorem 3
  • Remark 1
  • Remark 2
  • Theorem 4
  • Corollary 1
  • ...and 22 more