Table of Contents
Fetching ...

Rectified Decoupled Dataset Distillation: A Closer Look for Fair and Comprehensive Evaluation

Xinhao Zhong, Shuoyang Sun, Xulin Gu, Chenyang Zhu, Bin Chen, Yaowei Wang

TL;DR

This work identifies pervasive inconsistencies in post-evaluation protocols for decoupled dataset distillation and proposes $RD^3$, a unified evaluation framework and benchmark. By re-evaluating representative methods across diverse datasets, IPCs, and architectures under standardized post-evaluation settings, the authors show that many reported gains stem from evaluation choices rather than intrinsic data quality. The study also highlights critical factors such as initialization, hybrid soft labels, loss functions, LR scheduling, and data augmentation, and demonstrates that time-to-synthesize can outstrip accuracy gains as a practical efficiency metric. Overall, $RD^3$ provides a fair, reproducible foundation for comparing distillation methods and guiding future developments toward genuinely better synthetic data.

Abstract

Dataset distillation aims to generate compact synthetic datasets that enable models trained on them to achieve performance comparable to those trained on full real datasets, while substantially reducing storage and computational costs. Early bi-level optimization methods (e.g., MTT) have shown promising results on small-scale datasets, but their scalability is limited by high computational overhead. To address this limitation, recent decoupled dataset distillation methods (e.g., SRe$^2$L) separate the teacher model pre-training from the synthetic data generation process. These methods also introduce random data augmentation and epoch-wise soft labels during the post-evaluation phase to improve performance and generalization. However, existing decoupled distillation methods suffer from inconsistent post-evaluation protocols, which hinders progress in the field. In this work, we propose Rectified Decoupled Dataset Distillation (RD$^3$), and systematically investigate how different post-evaluation settings affect test accuracy. We further examine whether the reported performance differences across existing methods reflect true methodological advances or stem from discrepancies in evaluation procedures. Our analysis reveals that much of the performance variation can be attributed to inconsistent evaluation rather than differences in the intrinsic quality of the synthetic data. In addition, we identify general strategies that improve the effectiveness of distilled datasets across settings. By establishing a standardized benchmark and rigorous evaluation protocol, RD$^3$ provides a foundation for fair and reproducible comparisons in future dataset distillation research.

Rectified Decoupled Dataset Distillation: A Closer Look for Fair and Comprehensive Evaluation

TL;DR

This work identifies pervasive inconsistencies in post-evaluation protocols for decoupled dataset distillation and proposes , a unified evaluation framework and benchmark. By re-evaluating representative methods across diverse datasets, IPCs, and architectures under standardized post-evaluation settings, the authors show that many reported gains stem from evaluation choices rather than intrinsic data quality. The study also highlights critical factors such as initialization, hybrid soft labels, loss functions, LR scheduling, and data augmentation, and demonstrates that time-to-synthesize can outstrip accuracy gains as a practical efficiency metric. Overall, provides a fair, reproducible foundation for comparing distillation methods and guiding future developments toward genuinely better synthetic data.

Abstract

Dataset distillation aims to generate compact synthetic datasets that enable models trained on them to achieve performance comparable to those trained on full real datasets, while substantially reducing storage and computational costs. Early bi-level optimization methods (e.g., MTT) have shown promising results on small-scale datasets, but their scalability is limited by high computational overhead. To address this limitation, recent decoupled dataset distillation methods (e.g., SReL) separate the teacher model pre-training from the synthetic data generation process. These methods also introduce random data augmentation and epoch-wise soft labels during the post-evaluation phase to improve performance and generalization. However, existing decoupled distillation methods suffer from inconsistent post-evaluation protocols, which hinders progress in the field. In this work, we propose Rectified Decoupled Dataset Distillation (RD), and systematically investigate how different post-evaluation settings affect test accuracy. We further examine whether the reported performance differences across existing methods reflect true methodological advances or stem from discrepancies in evaluation procedures. Our analysis reveals that much of the performance variation can be attributed to inconsistent evaluation rather than differences in the intrinsic quality of the synthetic data. In addition, we identify general strategies that improve the effectiveness of distilled datasets across settings. By establishing a standardized benchmark and rigorous evaluation protocol, RD provides a foundation for fair and reproducible comparisons in future dataset distillation research.

Paper Structure

This paper contains 48 sections, 5 equations, 19 figures, 18 tables.

Figures (19)

  • Figure 1: Performance comparison of various distillation methods evaluated by ResNet-18 on ImageNet-1K under IPC=10. Previous methods achieve a significant 27.3% performance improvement being influenced by multiple factors. After fairly reevaluating all methods under a unified setting, we obtained a rectified 6.7% performance enhancement.
  • Figure 2: Performance comparison between SRe$^2$L and RDED on ImageNet-1K under IPC=10 evaluated by ResNet-18 with the same post-evaluation settings. The incremental techniques added from left to right lead to different performance impact.
  • Figure 3: Comparison of the effectiveness and efficiency of all the decoupled distillation methods. Upper-left quadrant representing optimal effectiveness-efficiency balance.
  • Figure 4: Performance on ImageNet-1K under IPC=10 with different smoothing factor $\zeta$.
  • Figure 5: Visual comparison of class "ostrich" with different distillation methods using various initialization.
  • ...and 14 more figures