Accelerating Large-Scale Dataset Distillation via Exploration-Exploitation Optimization
Muhammad J. Alahmadi, Peng Gao, Feiyi Wang, Dongkuan Xu
TL;DR
The paper addresses the efficiency-accuracy gap in large-scale dataset distillation by introducing Exploration--Exploitation Distillation (E^2D). It combines full-image initialization with a two-phase optimization that first explores diverse regions and then exploits high-loss regions to accelerate convergence, thereby reducing redundant updates. Empirically, E^2D achieves state-of-the-art accuracy on ImageNet-1K with up to 18× synthesis-time speedup and delivers substantial gains on ImageNet-21K while remaining significantly faster, with benefits across multiple architectures. This approach demonstrates that targeted, redundancy-reducing updates can bridge the accuracy-efficiency gap and is broadly compatible with existing distillation pipelines, enabling practical deployment under resource constraints.
Abstract
Dataset distillation compresses the original data into compact synthetic datasets, reducing training time and storage while retaining model performance, enabling deployment under limited resources. Although recent decoupling-based distillation methods enable dataset distillation at large scale, they continue to face an efficiency gap: optimization-based decoupling methods achieve higher accuracy but demand intensive computation, whereas optimization-free decoupling methods are efficient but sacrifice accuracy. To overcome this trade-off, we propose Exploration--Exploitation Distillation (E$^2$D), a simple, practical method that minimizes redundant computation through an efficient pipeline that begins with full-image initialization to preserve semantic integrity and feature diversity. It then uses a two-phase optimization strategy: an exploration phase that performs uniform updates and identifies high-loss regions, and an exploitation phase that focuses updates on these regions to accelerate convergence. We evaluate E$^2$D on large-scale benchmarks, surpassing the state-of-the-art on ImageNet-1K while being $18\times$ faster, and on ImageNet-21K, our method substantially improves accuracy while remaining $4.3\times$ faster. These results demonstrate that targeted, redundancy-reducing updates, rather than brute-force optimization, bridge the gap between accuracy and efficiency in large-scale dataset distillation. Code is available at https://github.com/ncsu-dk-lab/E2D.
