Table of Contents
Fetching ...

FD$^2$: A Dedicated Framework for Fine-Grained Dataset Distillation

Hongxu Ma, Guang Li, Shijie Wang, Dongzhan Zhou, Baoli Sun, Takahiro Ogawa, Miki Haseyama, Zhihui Wang

Abstract

Dataset distillation (DD) compresses a large training set into a small synthetic set, reducing storage and training cost, and has shown strong results on general benchmarks. Decoupled DD further improves efficiency by splitting the pipeline into pretraining, sample distillation, and soft-label generation. However, existing decoupled methods largely rely on coarse class-label supervision and optimize samples within each class in a nearly identical manner. On fine-grained datasets, this often yields distilled samples that (i) retain large intra-class variation with subtle inter-class differences and (ii) become overly similar within the same class, limiting localized discriminative cues and hurting recognition. To solve the above-mentioned problems, we propose FD$^{2}$, a dedicated framework for Fine-grained Dataset Distillation. FD$^{2}$ localizes discriminative regions and constructs fine-grained representations for distillation. During pretraining, counterfactual attention learning aggregates discriminative representations to update class prototypes. During distillation, a fine-grained characteristic constraint aligns each sample with its class prototype while repelling others, and a similarity constraint diversifies attention across same-class samples. Experiments on multiple fine-grained and general datasets show that FD$^{2}$ integrates seamlessly with decoupled DD and improves performance in most settings, indicating strong transferability.

FD$^2$: A Dedicated Framework for Fine-Grained Dataset Distillation

Abstract

Dataset distillation (DD) compresses a large training set into a small synthetic set, reducing storage and training cost, and has shown strong results on general benchmarks. Decoupled DD further improves efficiency by splitting the pipeline into pretraining, sample distillation, and soft-label generation. However, existing decoupled methods largely rely on coarse class-label supervision and optimize samples within each class in a nearly identical manner. On fine-grained datasets, this often yields distilled samples that (i) retain large intra-class variation with subtle inter-class differences and (ii) become overly similar within the same class, limiting localized discriminative cues and hurting recognition. To solve the above-mentioned problems, we propose FD, a dedicated framework for Fine-grained Dataset Distillation. FD localizes discriminative regions and constructs fine-grained representations for distillation. During pretraining, counterfactual attention learning aggregates discriminative representations to update class prototypes. During distillation, a fine-grained characteristic constraint aligns each sample with its class prototype while repelling others, and a similarity constraint diversifies attention across same-class samples. Experiments on multiple fine-grained and general datasets show that FD integrates seamlessly with decoupled DD and improves performance in most settings, indicating strong transferability.

Paper Structure

This paper contains 49 sections, 4 theorems, 39 equations, 13 figures, 10 tables, 1 algorithm.

Key Result

proposition 1

Let Then the margin in eq:avg_center_margin admits the lower bound Hence, decreasing $r_i$ or enlarging $\{d_{ij}\}_{j\neq y_i}$ increases $\bar{\Delta}_i$. In particular, if then $\bar{\Delta}_i>0$, implying that $z_i$ is, on average, closer to its own class center than to other class centers.

Figures (13)

  • Figure 1: (a) On CUB-200-2011, 10 classes are randomly sampled to compare the original training set and the SRe$^{2}$L++ distilled set, where intra-class dispersion (bars) and inter-class distance (line) are reported. (b) Attention heatmaps of Black-footed Albatross in CUB-200-2011, comparing the attended regions of distilled samples produced by SRe$^{2}$L++ and FD$^{2}$.
  • Figure 1: Visualization of distilled samples from the first 100 classes on CUB-200-2011 obtained by SRe$^2$++$_\mathrm{FD^{2}}$ at IPC$=1$.
  • Figure 2: Overview of FD$^{2}$. 1. We pretrain a Backbone+CAL teacher and maintain class prototypes online. 2. We distill images in the same-class groups of size $N_S$, adding a fine-grained characteristic constraint (prototype alignment or separation) and a similarity constraint (diverse attention), together with class-supervision from both branches. 3. We generate soft labels using the backbone branch.
  • Figure 2: Visualization of distilled samples from the last 100 classes on CUB-200-2011 obtained by SRe$^2$++$_\mathrm{FD^{2}}$ at IPC$=1$.
  • Figure 3: (a) t-SNE feature distribution. (b) Nearest-neighbor center distance of each class for distilled images on CUB-200-2011.
  • ...and 8 more figures

Theorems & Definitions (4)

  • proposition 1
  • corollary 1
  • proposition 2
  • corollary 2