Table of Contents
Fetching ...

Exploiting Inter-sample and Inter-feature Relations in Dataset Distillation

Wenxiao Deng, Wenbin Li, Tianyu Ding, Lei Wang, Hongguang Zhang, Kuihua Huang, Jing Huo, Yang Gao

TL;DR

The paper tackles two core shortcomings of distribution-matching-based dataset distillation: dispersed intra-class features and an exclusive focus on mean alignment. It introduces two plug-in constraints, $\mathcal{L}_{CC}$ for class centralization and $\mathcal{L}_{CM}$ for local covariance matching, and integrates them with baseline DM/IDM objectives to form $\mathcal{L} = \mathcal{L}_{DM/IDM} + \lambda_{CC} \mathcal{L}_{CC} + \lambda_{CM} \mathcal{L}_{CM}$. Empirical results across SVHN, CIFAR10/100, and TinyImageNet show consistent improvements over state-of-the-art methods, with up to $6.6\%$ gains on CIFAR10 and strong cross-architecture generalization (max drop $1.7\%$). The approach enables more data-efficient training at various IPCs and compression ratios and offers practical benefits for continual learning scenarios, supported by ablation and visualization analyses.

Abstract

Dataset distillation has emerged as a promising approach in deep learning, enabling efficient training with small synthetic datasets derived from larger real ones. Particularly, distribution matching-based distillation methods attract attention thanks to its effectiveness and low computational cost. However, these methods face two primary limitations: the dispersed feature distribution within the same class in synthetic datasets, reducing class discrimination, and an exclusive focus on mean feature consistency, lacking precision and comprehensiveness. To address these challenges, we introduce two novel constraints: a class centralization constraint and a covariance matching constraint. The class centralization constraint aims to enhance class discrimination by more closely clustering samples within classes. The covariance matching constraint seeks to achieve more accurate feature distribution matching between real and synthetic datasets through local feature covariance matrices, particularly beneficial when sample sizes are much smaller than the number of features. Experiments demonstrate notable improvements with these constraints, yielding performance boosts of up to 6.6% on CIFAR10, 2.9% on SVHN, 2.5% on CIFAR100, and 2.5% on TinyImageNet, compared to the state-of-the-art relevant methods. In addition, our method maintains robust performance in cross-architecture settings, with a maximum performance drop of 1.7% on four architectures. Code is available at https://github.com/VincenDen/IID.

Exploiting Inter-sample and Inter-feature Relations in Dataset Distillation

TL;DR

The paper tackles two core shortcomings of distribution-matching-based dataset distillation: dispersed intra-class features and an exclusive focus on mean alignment. It introduces two plug-in constraints, for class centralization and for local covariance matching, and integrates them with baseline DM/IDM objectives to form . Empirical results across SVHN, CIFAR10/100, and TinyImageNet show consistent improvements over state-of-the-art methods, with up to gains on CIFAR10 and strong cross-architecture generalization (max drop ). The approach enables more data-efficient training at various IPCs and compression ratios and offers practical benefits for continual learning scenarios, supported by ablation and visualization analyses.

Abstract

Dataset distillation has emerged as a promising approach in deep learning, enabling efficient training with small synthetic datasets derived from larger real ones. Particularly, distribution matching-based distillation methods attract attention thanks to its effectiveness and low computational cost. However, these methods face two primary limitations: the dispersed feature distribution within the same class in synthetic datasets, reducing class discrimination, and an exclusive focus on mean feature consistency, lacking precision and comprehensiveness. To address these challenges, we introduce two novel constraints: a class centralization constraint and a covariance matching constraint. The class centralization constraint aims to enhance class discrimination by more closely clustering samples within classes. The covariance matching constraint seeks to achieve more accurate feature distribution matching between real and synthetic datasets through local feature covariance matrices, particularly beneficial when sample sizes are much smaller than the number of features. Experiments demonstrate notable improvements with these constraints, yielding performance boosts of up to 6.6% on CIFAR10, 2.9% on SVHN, 2.5% on CIFAR100, and 2.5% on TinyImageNet, compared to the state-of-the-art relevant methods. In addition, our method maintains robust performance in cross-architecture settings, with a maximum performance drop of 1.7% on four architectures. Code is available at https://github.com/VincenDen/IID.
Paper Structure (15 sections, 9 equations, 8 figures, 7 tables)

This paper contains 15 sections, 9 equations, 8 figures, 7 tables.

Figures (8)

  • Figure 1: T-SNE visualisation of features from synthetic dataset obtained by DM zhao2023dataset and our method, using a pre-trained Resnet18 on CIFAR10. Different colors represent different classes. IPC denotes the number of images per class.
  • Figure 2: Illustration of the proposed covariance matching constraint. This constraint involves calculating local covariance matrices for corresponding classes in both real and synthetic datasets, followed by matching these matrices.
  • Figure 3: Visualization of different $\beta$ on CIFAR10 with IPC=10.
  • Figure 4: Ablation study of the weighting parameter on CIFAR10.
  • Figure 5: Accuracy progression over iteration on CIFAR10.
  • ...and 3 more figures