Generative Dataset Distillation: Balancing Global Structure and Local Details
Longzhen Li, Guang Li, Ren Togo, Keisuke Maeda, Takahiro Ogawa, Miki Haseyama
TL;DR
The paper addresses the efficiency and generalization gaps in dataset distillation by distilling into a conditional GAN and enforcing global-local coherence. It introduces a two-loss framework—global structure via logits and local detail via intermediate features—optimized with a model pool to improve cross-architecture robustness. After training, the generator enables on-demand distilled data generation, reducing redeployment costs when IPC or architectures change. Experiments on MNIST, Fashion-MNIST, and CIFAR-10 show state-of-the-art performance across IPC settings and strong cross-architecture generalization, confirming practical benefits for scalable data distillation workflows.
Abstract
In this paper, we propose a new dataset distillation method that considers balancing global structure and local details when distilling the information from a large dataset into a generative model. Dataset distillation has been proposed to reduce the size of the required dataset when training models. The conventional dataset distillation methods face the problem of long redeployment time and poor cross-architecture performance. Moreover, previous methods focused too much on the high-level semantic attributes between the synthetic dataset and the original dataset while ignoring the local features such as texture and shape. Based on the above understanding, we propose a new method for distilling the original image dataset into a generative model. Our method involves using a conditional generative adversarial network to generate the distilled dataset. Subsequently, we ensure balancing global structure and local details in the distillation process, continuously optimizing the generator for more information-dense dataset generation.
