Table of Contents
Fetching ...

Generative Dataset Distillation: Balancing Global Structure and Local Details

Longzhen Li, Guang Li, Ren Togo, Keisuke Maeda, Takahiro Ogawa, Miki Haseyama

TL;DR

The paper addresses the efficiency and generalization gaps in dataset distillation by distilling into a conditional GAN and enforcing global-local coherence. It introduces a two-loss framework—global structure via logits and local detail via intermediate features—optimized with a model pool to improve cross-architecture robustness. After training, the generator enables on-demand distilled data generation, reducing redeployment costs when IPC or architectures change. Experiments on MNIST, Fashion-MNIST, and CIFAR-10 show state-of-the-art performance across IPC settings and strong cross-architecture generalization, confirming practical benefits for scalable data distillation workflows.

Abstract

In this paper, we propose a new dataset distillation method that considers balancing global structure and local details when distilling the information from a large dataset into a generative model. Dataset distillation has been proposed to reduce the size of the required dataset when training models. The conventional dataset distillation methods face the problem of long redeployment time and poor cross-architecture performance. Moreover, previous methods focused too much on the high-level semantic attributes between the synthetic dataset and the original dataset while ignoring the local features such as texture and shape. Based on the above understanding, we propose a new method for distilling the original image dataset into a generative model. Our method involves using a conditional generative adversarial network to generate the distilled dataset. Subsequently, we ensure balancing global structure and local details in the distillation process, continuously optimizing the generator for more information-dense dataset generation.

Generative Dataset Distillation: Balancing Global Structure and Local Details

TL;DR

The paper addresses the efficiency and generalization gaps in dataset distillation by distilling into a conditional GAN and enforcing global-local coherence. It introduces a two-loss framework—global structure via logits and local detail via intermediate features—optimized with a model pool to improve cross-architecture robustness. After training, the generator enables on-demand distilled data generation, reducing redeployment costs when IPC or architectures change. Experiments on MNIST, Fashion-MNIST, and CIFAR-10 show state-of-the-art performance across IPC settings and strong cross-architecture generalization, confirming practical benefits for scalable data distillation workflows.

Abstract

In this paper, we propose a new dataset distillation method that considers balancing global structure and local details when distilling the information from a large dataset into a generative model. Dataset distillation has been proposed to reduce the size of the required dataset when training models. The conventional dataset distillation methods face the problem of long redeployment time and poor cross-architecture performance. Moreover, previous methods focused too much on the high-level semantic attributes between the synthetic dataset and the original dataset while ignoring the local features such as texture and shape. Based on the above understanding, we propose a new method for distilling the original image dataset into a generative model. Our method involves using a conditional generative adversarial network to generate the distilled dataset. Subsequently, we ensure balancing global structure and local details in the distillation process, continuously optimizing the generator for more information-dense dataset generation.
Paper Structure (15 sections, 10 equations, 3 figures, 2 tables, 1 algorithm)

This paper contains 15 sections, 10 equations, 3 figures, 2 tables, 1 algorithm.

Figures (3)

  • Figure 1: Overview of the distillation process. The goal is to train a generator that synthesizes images rich in information (referred to as distilled images), taking into account both global structure and local details.
  • Figure 2: Distilled MNIST, Fashion MNIST, and CIFAR-10 datasets with IPC = 10.
  • Figure 3: Ablation study of $\omega_l$ on CIFAR-10 dataset with IPC = 1.