Table of Contents
Fetching ...

Data-to-Model Distillation: Data-Efficient Learning Framework

Ahmad Sajedi, Samir Khaki, Lucy Z. Liu, Ehsan Amjadian, Yuri A. Lawryshyn, Konstantinos N. Plataniotis

TL;DR

A novel framework called Data-to-Model Distillation (D2M) is proposed to distill the real dataset's knowledge into the learnable parameters of a pre-trained generative model by aligning rich representations extracted from real and generated images.

Abstract

Dataset distillation aims to distill the knowledge of a large-scale real dataset into small yet informative synthetic data such that a model trained on it performs as well as a model trained on the full dataset. Despite recent progress, existing dataset distillation methods often struggle with computational efficiency, scalability to complex high-resolution datasets, and generalizability to deep architectures. These approaches typically require retraining when the distillation ratio changes, as knowledge is embedded in raw pixels. In this paper, we propose a novel framework called Data-to-Model Distillation (D2M) to distill the real dataset's knowledge into the learnable parameters of a pre-trained generative model by aligning rich representations extracted from real and generated images. The learned generative model can then produce informative training images for different distillation ratios and deep architectures. Extensive experiments on 15 datasets of varying resolutions show D2M's superior performance, re-distillation efficiency, and cross-architecture generalizability. Our method effectively scales up to high-resolution 128x128 ImageNet-1K. Furthermore, we verify D2M's practical benefits for downstream applications in neural architecture search.

Data-to-Model Distillation: Data-Efficient Learning Framework

TL;DR

A novel framework called Data-to-Model Distillation (D2M) is proposed to distill the real dataset's knowledge into the learnable parameters of a pre-trained generative model by aligning rich representations extracted from real and generated images.

Abstract

Dataset distillation aims to distill the knowledge of a large-scale real dataset into small yet informative synthetic data such that a model trained on it performs as well as a model trained on the full dataset. Despite recent progress, existing dataset distillation methods often struggle with computational efficiency, scalability to complex high-resolution datasets, and generalizability to deep architectures. These approaches typically require retraining when the distillation ratio changes, as knowledge is embedded in raw pixels. In this paper, we propose a novel framework called Data-to-Model Distillation (D2M) to distill the real dataset's knowledge into the learnable parameters of a pre-trained generative model by aligning rich representations extracted from real and generated images. The learned generative model can then produce informative training images for different distillation ratios and deep architectures. Extensive experiments on 15 datasets of varying resolutions show D2M's superior performance, re-distillation efficiency, and cross-architecture generalizability. Our method effectively scales up to high-resolution 128x128 ImageNet-1K. Furthermore, we verify D2M's practical benefits for downstream applications in neural architecture search.

Paper Structure

This paper contains 23 sections, 5 equations, 38 figures, 13 tables, 1 algorithm.

Figures (38)

  • Figure 1: Different distillation frameworks for efficient learning.
  • Figure 2: An overview of the proposed D2M framework. D2M distills the knowledge of large-scale datasets into the parameter space of a pre-trained generator through embedding matching and prediction matching modules. The learned generator can then produce small yet informative training images for the downstream classification tasks. $Z$ and $Y$ represent the random noises and labels, respectively.
  • Figure 3: Performance comparison and count of parameters on 128$\times$128 ImageNet-1K.
  • Figure 4: The effect of (a) temperature, (b) batch size, and (c) task balance on the D2M's performance for CIFAR10 with IPC10.
  • Figure 5: Example generated images from 32$\times$32 CIFAR-100, 64$\times$64 Tiny ImageNet, 128$\times$128 ImageNet-1K, and 128$\times$128 ImageSquawk.
  • ...and 33 more figures