Dataset Condensation with Distribution Matching
Bo Zhao, Hakan Bilen
TL;DR
The paper tackles the rising computational cost of training deep models by proposing a dataset condensation method that uses distribution matching in randomly sampled embedding spaces to synthesize small, informative training sets. By minimizing the maximum mean discrepancy between real and synthetic data across embeddings, the method avoids expensive bi-level optimization and the second-order derivatives common in prior approaches, achieving large speedups (e.g., ~45x on CIFAR-10) while maintaining or improving accuracy. The approach scales to large datasets like TinyImageNet and ImageNet-1K, supports per-class synthesis, and enables parallelization. It also demonstrates practical benefits in continual learning and neural architecture search, with strong cross-architecture generalization and robust ablations across embedding networks. Overall, the work offers a highly efficient and scalable alternative to traditional dataset condensation methods with broad applicability in model selection and continual learning scenarios.
Abstract
Computational cost of training state-of-the-art deep models in many learning problems is rapidly increasing due to more sophisticated models and larger datasets. A recent promising direction for reducing training cost is dataset condensation that aims to replace the original large training set with a significantly smaller learned synthetic set while preserving the original information. While training deep models on the small set of condensed images can be extremely fast, their synthesis remains computationally expensive due to the complex bi-level optimization and second-order derivative computation. In this work, we propose a simple yet effective method that synthesizes condensed images by matching feature distributions of the synthetic and original training images in many sampled embedding spaces. Our method significantly reduces the synthesis cost while achieving comparable or better performance. Thanks to its efficiency, we apply our method to more realistic and larger datasets with sophisticated neural architectures and obtain a significant performance boost. We also show promising practical benefits of our method in continual learning and neural architecture search.
