Table of Contents
Fetching ...

Going Beyond Feature Similarity: Effective Dataset Distillation based on Class-Aware Conditional Mutual Information

Xinhao Zhong, Bin Chen, Hao Fang, Xulin Gu, Shu-Tao Xia, En-Hui Yang

TL;DR

The paper introduces class-aware conditional mutual information (CMI) as a novel measure of synthetic dataset complexity and presents a CMI-enhanced loss L = L_{DD} + \lambda CMI_{emp}(S) to regularize dataset distillation. By estimating CMI in the feature space of pre-trained networks and aggregating across multiple proxies, the method constrains the synthetic data to be more class-centered and easier to learn, improving generalization and training efficiency. Empirical results across CIFAR, Tiny-ImageNet, and ImageNet-1K show consistent performance gains when applying CMI to diverse DD baselines, along with cross-architecture robustness and faster convergence. The approach is plug-and-play, scalable, and capable of reducing training time while yielding higher or comparable accuracies, with future directions including diffusion- or GAN-based DD extensions.

Abstract

Dataset distillation (DD) aims to minimize the time and memory consumption needed for training deep neural networks on large datasets, by creating a smaller synthetic dataset that has similar performance to that of the full real dataset. However, current dataset distillation methods often result in synthetic datasets that are excessively difficult for networks to learn from, due to the compression of a substantial amount of information from the original data through metrics measuring feature similarity, e,g., distribution matching (DM). In this work, we introduce conditional mutual information (CMI) to assess the class-aware complexity of a dataset and propose a novel method by minimizing CMI. Specifically, we minimize the distillation loss while constraining the class-aware complexity of the synthetic dataset by minimizing its empirical CMI from the feature space of pre-trained networks, simultaneously. Conducting on a thorough set of experiments, we show that our method can serve as a general regularization method to existing DD methods and improve the performance and training efficiency.

Going Beyond Feature Similarity: Effective Dataset Distillation based on Class-Aware Conditional Mutual Information

TL;DR

The paper introduces class-aware conditional mutual information (CMI) as a novel measure of synthetic dataset complexity and presents a CMI-enhanced loss L = L_{DD} + \lambda CMI_{emp}(S) to regularize dataset distillation. By estimating CMI in the feature space of pre-trained networks and aggregating across multiple proxies, the method constrains the synthetic data to be more class-centered and easier to learn, improving generalization and training efficiency. Empirical results across CIFAR, Tiny-ImageNet, and ImageNet-1K show consistent performance gains when applying CMI to diverse DD baselines, along with cross-architecture robustness and faster convergence. The approach is plug-and-play, scalable, and capable of reducing training time while yielding higher or comparable accuracies, with future directions including diffusion- or GAN-based DD extensions.

Abstract

Dataset distillation (DD) aims to minimize the time and memory consumption needed for training deep neural networks on large datasets, by creating a smaller synthetic dataset that has similar performance to that of the full real dataset. However, current dataset distillation methods often result in synthetic datasets that are excessively difficult for networks to learn from, due to the compression of a substantial amount of information from the original data through metrics measuring feature similarity, e,g., distribution matching (DM). In this work, we introduce conditional mutual information (CMI) to assess the class-aware complexity of a dataset and propose a novel method by minimizing CMI. Specifically, we minimize the distillation loss while constraining the class-aware complexity of the synthetic dataset by minimizing its empirical CMI from the feature space of pre-trained networks, simultaneously. Conducting on a thorough set of experiments, we show that our method can serve as a general regularization method to existing DD methods and improve the performance and training efficiency.

Paper Structure

This paper contains 26 sections, 9 equations, 11 figures, 11 tables, 1 algorithm.

Figures (11)

  • Figure 1: Visualization of the synthetic dataset generated by DM with (a) high CMI value, and (b) low CMI value.
  • Figure 2: Ablation study on the weighting parameter $\lambda$.
  • Figure 3: Accuracy curve w. and w.o. CMI constraint.
  • Figure 4: Visualization comparison between DSA (Left column) and DSA with CMI constraint (Right column).
  • Figure 5: The $\mathcal{L}_{IDC}$ curves while training on CIFAR10 under IPC=10.
  • ...and 6 more figures