Table of Contents
Fetching ...

Accelerating Large-Scale Dataset Distillation via Exploration-Exploitation Optimization

Muhammad J. Alahmadi, Peng Gao, Feiyi Wang, Dongkuan Xu

TL;DR

The paper addresses the efficiency-accuracy gap in large-scale dataset distillation by introducing Exploration--Exploitation Distillation (E^2D). It combines full-image initialization with a two-phase optimization that first explores diverse regions and then exploits high-loss regions to accelerate convergence, thereby reducing redundant updates. Empirically, E^2D achieves state-of-the-art accuracy on ImageNet-1K with up to 18× synthesis-time speedup and delivers substantial gains on ImageNet-21K while remaining significantly faster, with benefits across multiple architectures. This approach demonstrates that targeted, redundancy-reducing updates can bridge the accuracy-efficiency gap and is broadly compatible with existing distillation pipelines, enabling practical deployment under resource constraints.

Abstract

Dataset distillation compresses the original data into compact synthetic datasets, reducing training time and storage while retaining model performance, enabling deployment under limited resources. Although recent decoupling-based distillation methods enable dataset distillation at large scale, they continue to face an efficiency gap: optimization-based decoupling methods achieve higher accuracy but demand intensive computation, whereas optimization-free decoupling methods are efficient but sacrifice accuracy. To overcome this trade-off, we propose Exploration--Exploitation Distillation (E$^2$D), a simple, practical method that minimizes redundant computation through an efficient pipeline that begins with full-image initialization to preserve semantic integrity and feature diversity. It then uses a two-phase optimization strategy: an exploration phase that performs uniform updates and identifies high-loss regions, and an exploitation phase that focuses updates on these regions to accelerate convergence. We evaluate E$^2$D on large-scale benchmarks, surpassing the state-of-the-art on ImageNet-1K while being $18\times$ faster, and on ImageNet-21K, our method substantially improves accuracy while remaining $4.3\times$ faster. These results demonstrate that targeted, redundancy-reducing updates, rather than brute-force optimization, bridge the gap between accuracy and efficiency in large-scale dataset distillation. Code is available at https://github.com/ncsu-dk-lab/E2D.

Accelerating Large-Scale Dataset Distillation via Exploration-Exploitation Optimization

TL;DR

The paper addresses the efficiency-accuracy gap in large-scale dataset distillation by introducing Exploration--Exploitation Distillation (E^2D). It combines full-image initialization with a two-phase optimization that first explores diverse regions and then exploits high-loss regions to accelerate convergence, thereby reducing redundant updates. Empirically, E^2D achieves state-of-the-art accuracy on ImageNet-1K with up to 18× synthesis-time speedup and delivers substantial gains on ImageNet-21K while remaining significantly faster, with benefits across multiple architectures. This approach demonstrates that targeted, redundancy-reducing updates can bridge the accuracy-efficiency gap and is broadly compatible with existing distillation pipelines, enabling practical deployment under resource constraints.

Abstract

Dataset distillation compresses the original data into compact synthetic datasets, reducing training time and storage while retaining model performance, enabling deployment under limited resources. Although recent decoupling-based distillation methods enable dataset distillation at large scale, they continue to face an efficiency gap: optimization-based decoupling methods achieve higher accuracy but demand intensive computation, whereas optimization-free decoupling methods are efficient but sacrifice accuracy. To overcome this trade-off, we propose Exploration--Exploitation Distillation (ED), a simple, practical method that minimizes redundant computation through an efficient pipeline that begins with full-image initialization to preserve semantic integrity and feature diversity. It then uses a two-phase optimization strategy: an exploration phase that performs uniform updates and identifies high-loss regions, and an exploitation phase that focuses updates on these regions to accelerate convergence. We evaluate ED on large-scale benchmarks, surpassing the state-of-the-art on ImageNet-1K while being faster, and on ImageNet-21K, our method substantially improves accuracy while remaining faster. These results demonstrate that targeted, redundancy-reducing updates, rather than brute-force optimization, bridge the gap between accuracy and efficiency in large-scale dataset distillation. Code is available at https://github.com/ncsu-dk-lab/E2D.
Paper Structure (19 sections, 2 equations, 9 figures, 15 tables, 1 algorithm)

This paper contains 19 sections, 2 equations, 9 figures, 15 tables, 1 algorithm.

Figures (9)

  • Figure 1: Comparison of Top-1 accuracy and synthesis time on ImageNet-1K using ResNet-18 for various dataset distillation methods at IPC 10 and IPC 50. Synthesis time is measured on a single RTX A6000 GPU. Our method converges substantially faster and achieves the highest accuracy, leading to the best accuracy–efficiency trade‑off.
  • Figure 2: Semantic cosine similarity across ImageNet‑1K classes at IPC 50 using a ResNet‑18 teacher. Lower values indicate greater diversity and reduced redundancy; our method consistently achieves the lowest similarity.
  • Figure 3: Visual comparison of synthetic data generated by SRe$^2$L, RDED, DELLT, EDC, and our method E$^2$D, which produces more diverse, less redundant samples and preserves semantic integrity with full‑size feature representations.
  • Figure 4: Overview of our proposed method E$^2$D. The pipeline consists of four components: (1) Full-size Image Initialization, which preserves the semantic and structural information of the original data, preventing distortion or redundancy; (2) Exploration Phase, which identifies challenging high-loss regions and ensure balanced optimization; (3) Exploitation Phase, which iteratively refines these challenging regions for efficient convergence; and (4) Accelerated Learning Schedule, applied during student training to further speed up convergence. Together, these components enable fast and effective dataset distillation with minimal redundancy.
  • Figure 5: Cosine similarity trends across optimization steps.
  • ...and 4 more figures