Table of Contents
Fetching ...

Emphasizing Discriminative Features for Dataset Distillation in Complex Scenarios

Kai Wang, Zekai Li, Zhi-Qi Cheng, Samir Khaki, Ahmad Sajedi, Ramakrishna Vedantam, Konstantinos N Plataniotis, Alexander Hauptmann, Yang You

TL;DR

This work tackles the challenge of dataset distillation in complex visual scenarios by introducing EDF, a method that emphasizes discriminative features via Grad-CAM-guided gradient weighting and selective supervision. EDF combines Common Pattern Dropout to suppress low-loss, non-discriminative signals with Discriminative Area Enhancement to bias updates toward highly activated regions, improving the fidelity of distilled data on complex datasets. To benchmark performance in realistic settings, the authors propose Comp-DD, a suite of ImageNet-1K subsets organized by complexity, and demonstrate SOTA gains, including lossless results on several subsets. Overall, EDF advances practical DD by targeting discriminative regions, providing a scalable benchmark, and showing robust cross-architecture generalization. The approach offers a concrete pathway to deploying compact distilled datasets in real-world, complex recognition tasks.

Abstract

Dataset distillation has demonstrated strong performance on simple datasets like CIFAR, MNIST, and TinyImageNet but struggles to achieve similar results in more complex scenarios. In this paper, we propose EDF (emphasizes the discriminative features), a dataset distillation method that enhances key discriminative regions in synthetic images using Grad-CAM activation maps. Our approach is inspired by a key observation: in simple datasets, high-activation areas typically occupy most of the image, whereas in complex scenarios, the size of these areas is much smaller. Unlike previous methods that treat all pixels equally when synthesizing images, EDF uses Grad-CAM activation maps to enhance high-activation areas. From a supervision perspective, we downplay supervision signals that have lower losses, as they contain common patterns. Additionally, to help the DD community better explore complex scenarios, we build the Complex Dataset Distillation (Comp-DD) benchmark by meticulously selecting sixteen subsets, eight easy and eight hard, from ImageNet-1K. In particular, EDF consistently outperforms SOTA results in complex scenarios, such as ImageNet-1K subsets. Hopefully, more researchers will be inspired and encouraged to improve the practicality and efficacy of DD. Our code and benchmark will be made public at https://github.com/NUS-HPC-AI-Lab/EDF.

Emphasizing Discriminative Features for Dataset Distillation in Complex Scenarios

TL;DR

This work tackles the challenge of dataset distillation in complex visual scenarios by introducing EDF, a method that emphasizes discriminative features via Grad-CAM-guided gradient weighting and selective supervision. EDF combines Common Pattern Dropout to suppress low-loss, non-discriminative signals with Discriminative Area Enhancement to bias updates toward highly activated regions, improving the fidelity of distilled data on complex datasets. To benchmark performance in realistic settings, the authors propose Comp-DD, a suite of ImageNet-1K subsets organized by complexity, and demonstrate SOTA gains, including lossless results on several subsets. Overall, EDF advances practical DD by targeting discriminative regions, providing a scalable benchmark, and showing robust cross-architecture generalization. The approach offers a concrete pathway to deploying compact distilled datasets in real-world, complex recognition tasks.

Abstract

Dataset distillation has demonstrated strong performance on simple datasets like CIFAR, MNIST, and TinyImageNet but struggles to achieve similar results in more complex scenarios. In this paper, we propose EDF (emphasizes the discriminative features), a dataset distillation method that enhances key discriminative regions in synthetic images using Grad-CAM activation maps. Our approach is inspired by a key observation: in simple datasets, high-activation areas typically occupy most of the image, whereas in complex scenarios, the size of these areas is much smaller. Unlike previous methods that treat all pixels equally when synthesizing images, EDF uses Grad-CAM activation maps to enhance high-activation areas. From a supervision perspective, we downplay supervision signals that have lower losses, as they contain common patterns. Additionally, to help the DD community better explore complex scenarios, we build the Complex Dataset Distillation (Comp-DD) benchmark by meticulously selecting sixteen subsets, eight easy and eight hard, from ImageNet-1K. In particular, EDF consistently outperforms SOTA results in complex scenarios, such as ImageNet-1K subsets. Hopefully, more researchers will be inspired and encouraged to improve the practicality and efficacy of DD. Our code and benchmark will be made public at https://github.com/NUS-HPC-AI-Lab/EDF.

Paper Structure

This paper contains 59 sections, 5 equations, 13 figures, 15 tables, 1 algorithm.

Figures (13)

  • Figure 1: (a) DD recovery ratio (distilled data accuracy over full data accuracy) comparison between CIFAR-10 and IN1K-CIFAR-10. We use trajectory matching for demonstration. (b) Comparison between Grad-CAM activation map statistics of CIFAR-10 and IN1K-CIFAR-10. The ratio refers to the percentage of pixels whose activation values are higher than 0.5.
  • Figure 2: (a) Grad-CAM activation maps of the image with initialization, high-loss supervision distillation, and low-loss supervision distillation. (b) t-SNE visualization of image features with only low-loss supervision. Different colors represent different classes. The top right is inter-class distance computed by the average of point-wise distances.
  • Figure 3: Workflow of Emphasize Discriminative Features (EDF). EDF comprises two modules: (1) Common Pattern Dropout, which filters out low-loss signals, and the (2) Discriminative Area Enhancement, which amplifies gradients in critical regions. $\beta$ denotes the enhancement factor. "mean" denotes the mean activation value of the activation map.
  • Figure 4: (a) Statistics of the training set in the Comp-DD benchmark. Each subset contains 500 images in the validation set. (b) Comparison of subset-level complexity between easy and hard subsets across all categories. The complexity of hard subsets is higher than that of easy subsets.
  • Figure 5: (a) Comparison between high-loss and low-loss supervision distilled images. (b) Comparison of discriminative areas in images produced by initialization, DATM, and EDF. Figures at the bottom are increments made by EDF over the initial image.
  • ...and 8 more figures