Table of Contents
Fetching ...

Enhancing Dataset Distillation via Non-Critical Region Refinement

Minh-Tuan Tran, Trung Le, Xuan-May Le, Thanh-Toan Do, Dinh Phung

TL;DR

This work tackles dataset distillation by balancing instance-specific and class-general information through the NRr-DD framework, which integrates Critical-based Initial Data Discovery, Non-Critical Region Refinement, and Relabeling. A key advance is the Distance-Based Representative (DBR) knowledge transfer, which avoids heavy soft-label storage by tracking two divergences per instance that guide student-teacher alignment. Empirical results show NRr-DD achieves state-of-the-art performance on both small- and large-scale datasets (e.g., ImageNet1k at ~60.2% top-1 with ResNet18) while reducing memory demands by up to 500x compared to soft-label approaches. Overall, the method preserves fine-grained details while enriching non-critical regions, offering a scalable, efficient path for high-quality dataset distillation applicable to diverse training environments.

Abstract

Dataset distillation has become a popular method for compressing large datasets into smaller, more efficient representations while preserving critical information for model training. Data features are broadly categorized into two types: instance-specific features, which capture unique, fine-grained details of individual examples, and class-general features, which represent shared, broad patterns across a class. However, previous approaches often struggle to balance these features-some focus solely on class-general patterns, neglecting finer instance details, while others prioritize instance-specific features, overlooking the shared characteristics essential for class-level understanding. In this paper, we introduce the Non-Critical Region Refinement Dataset Distillation (NRR-DD) method, which preserves instance-specific details and fine-grained regions in synthetic data while enriching non-critical regions with class-general information. This approach enables models to leverage all pixel information, capturing both feature types and enhancing overall performance. Additionally, we present Distance-Based Representative (DBR) knowledge transfer, which eliminates the need for soft labels in training by relying on the distance between synthetic data predictions and one-hot encoded labels. Experimental results show that NRR-DD achieves state-of-the-art performance on both small- and large-scale datasets. Furthermore, by storing only two distances per instance, our method delivers comparable results across various settings. The code is available at https://github.com/tmtuan1307/NRR-DD.

Enhancing Dataset Distillation via Non-Critical Region Refinement

TL;DR

This work tackles dataset distillation by balancing instance-specific and class-general information through the NRr-DD framework, which integrates Critical-based Initial Data Discovery, Non-Critical Region Refinement, and Relabeling. A key advance is the Distance-Based Representative (DBR) knowledge transfer, which avoids heavy soft-label storage by tracking two divergences per instance that guide student-teacher alignment. Empirical results show NRr-DD achieves state-of-the-art performance on both small- and large-scale datasets (e.g., ImageNet1k at ~60.2% top-1 with ResNet18) while reducing memory demands by up to 500x compared to soft-label approaches. Overall, the method preserves fine-grained details while enriching non-critical regions, offering a scalable, efficient path for high-quality dataset distillation applicable to diverse training environments.

Abstract

Dataset distillation has become a popular method for compressing large datasets into smaller, more efficient representations while preserving critical information for model training. Data features are broadly categorized into two types: instance-specific features, which capture unique, fine-grained details of individual examples, and class-general features, which represent shared, broad patterns across a class. However, previous approaches often struggle to balance these features-some focus solely on class-general patterns, neglecting finer instance details, while others prioritize instance-specific features, overlooking the shared characteristics essential for class-level understanding. In this paper, we introduce the Non-Critical Region Refinement Dataset Distillation (NRR-DD) method, which preserves instance-specific details and fine-grained regions in synthetic data while enriching non-critical regions with class-general information. This approach enables models to leverage all pixel information, capturing both feature types and enhancing overall performance. Additionally, we present Distance-Based Representative (DBR) knowledge transfer, which eliminates the need for soft labels in training by relying on the distance between synthetic data predictions and one-hot encoded labels. Experimental results show that NRR-DD achieves state-of-the-art performance on both small- and large-scale datasets. Furthermore, by storing only two distances per instance, our method delivers comparable results across various settings. The code is available at https://github.com/tmtuan1307/NRR-DD.

Paper Structure

This paper contains 23 sections, 10 equations, 4 figures, 13 tables, 1 algorithm.

Figures (4)

  • Figure 1: Comparison of our method with two popular frameworks, SRe$^2$L sre2l and RDED rded, for generating synthetic datasets. RDED selects high-confidence, easily classifiable images, while our method focuses on low-confidence, harder-to-classify samples, which helps reduce overfitting and improve model accuracy. Additionally, RDED targets instance-specific features without refinement, and SRe$^2$L updates all pixels to capture class-general features, often at the expense of fine details. In contrast, our NRR-DD method preserves fine-grained details while capturing class-general features by updating only non-critical pixels.
  • Figure 2: The architecture of our NRR-DD consists of three key stages: (i) Critical-based Initial Data Discovery (Section \ref{['sec:ids']}), which selects patches with a high CAM ratio but low confidence level to capture instance-specific features; (ii) Non-Critical Region Refinement (Section \ref{['sec:nrr']}), where CAM cam is used to identify and refine both critical and non-critical regions, preserving fine-grained details while enriching non-critical areas with class-general information; (iii) Knowledge Transfer, which aims to minimize the distance between $\mathcal{S}(\tilde{x}_\text{mix})$ (student prediction) and $\mathcal{T}(\tilde{x}_\text{mix})$ (pretrained teacher prediction or soft label) by reducing the distance between $d^T_\text{org}$ and $d^S_\text{org}$, as well as between $d^T_\text{aug}$ and $d^S_\text{aug}$. By storing only the two values, $d^T_\text{org}$ and $d^T_\text{aug}$, the new model can effectively mimic the performance of the pretrained one.
  • Figure 3: Visualization of images from the 'tench' and 'English springer' classes synthesized using various dataset distillation methods, including SRe$^2$L sre2l, RDED rded, and our NRR-DD. For additional visualizations, please refer to the Supplementary Material.
  • Figure 4: Visualization with various value of $\epsilon$.