Table of Contents
Fetching ...

Large-Scale Data-Free Knowledge Distillation for ImageNet via Multi-Resolution Data Generation

Minh-Tuan Tran, Trung Le, Xuan-May Le, Jianfei Cai, Mehrtash Harandi, Dinh Phung

TL;DR

MUlti-reSolution data-freE (MUSE) is introduced, which generates images at lower resolutions while using Class Activation Maps (CAMs) to ensure that the generated images retain critical, class-specific features, leading to significant performance improvements.

Abstract

Data-Free Knowledge Distillation (DFKD) is an advanced technique that enables knowledge transfer from a teacher model to a student model without relying on original training data. While DFKD methods have achieved success on smaller datasets like CIFAR10 and CIFAR100, they encounter challenges on larger, high-resolution datasets such as ImageNet. A primary issue with previous approaches is their generation of synthetic images at high resolutions (e.g., $224 \times 224$) without leveraging information from real images, often resulting in noisy images that lack essential class-specific features in large datasets. Additionally, the computational cost of generating the extensive data needed for effective knowledge transfer can be prohibitive. In this paper, we introduce MUlti-reSolution data-freE (MUSE) to address these limitations. MUSE generates images at lower resolutions while using Class Activation Maps (CAMs) to ensure that the generated images retain critical, class-specific features. To further enhance model diversity, we propose multi-resolution generation and embedding diversity techniques that strengthen latent space representations, leading to significant performance improvements. Experimental results demonstrate that MUSE achieves state-of-the-art performance across both small- and large-scale datasets, with notable performance gains of up to two digits in nearly all ImageNet and subset experiments. Code is available at https://github.com/tmtuan1307/muse.

Large-Scale Data-Free Knowledge Distillation for ImageNet via Multi-Resolution Data Generation

TL;DR

MUlti-reSolution data-freE (MUSE) is introduced, which generates images at lower resolutions while using Class Activation Maps (CAMs) to ensure that the generated images retain critical, class-specific features, leading to significant performance improvements.

Abstract

Data-Free Knowledge Distillation (DFKD) is an advanced technique that enables knowledge transfer from a teacher model to a student model without relying on original training data. While DFKD methods have achieved success on smaller datasets like CIFAR10 and CIFAR100, they encounter challenges on larger, high-resolution datasets such as ImageNet. A primary issue with previous approaches is their generation of synthetic images at high resolutions (e.g., ) without leveraging information from real images, often resulting in noisy images that lack essential class-specific features in large datasets. Additionally, the computational cost of generating the extensive data needed for effective knowledge transfer can be prohibitive. In this paper, we introduce MUlti-reSolution data-freE (MUSE) to address these limitations. MUSE generates images at lower resolutions while using Class Activation Maps (CAMs) to ensure that the generated images retain critical, class-specific features. To further enhance model diversity, we propose multi-resolution generation and embedding diversity techniques that strengthen latent space representations, leading to significant performance improvements. Experimental results demonstrate that MUSE achieves state-of-the-art performance across both small- and large-scale datasets, with notable performance gains of up to two digits in nearly all ImageNet and subset experiments. Code is available at https://github.com/tmtuan1307/muse.

Paper Structure

This paper contains 24 sections, 14 equations, 7 figures, 15 tables, 1 algorithm.

Figures (7)

  • Figure 1: Accuracies of our MUSE method and current SOTA Fast fastdfkd and NAYER nayer on ImageNet1K, all evaluated under approximate training time and same data scale ratios from 1, 5, 10 and 20% of the original training set (bubble size).
  • Figure 2: (a) The previous model fails to capture class-specific features and contains a lot of noisy pixels. (b) The visualization demonstrates that only a small set of key features is important for classifiers. (c) Our model generates synthetic images at lower resolutions and leverages CAM to generate pixels containing important information.
  • Figure 3: Accuracies and training time of using lower-resolution $112\times 112$ images (MUSE) are compared to higher-resolution $224 \times 224$ images (NAYER nayer) on ImageNet1K with various images-per-class settings (bubble size). It is clear that using lower-resolution images not only improves performance but also significantly speeds up the training time.
  • Figure 4: (a) Overview of the MUSE architecture, illustrating the two-phase training process: generator training and student training. The model generates lower-resolution images and enhances their quality using CAM-Enhanced Quality Loss, while also promoting diversity through Embedding Diversity Loss ($\mathcal{L}_{ed}$ and $\mathcal{L}_{aed}$). (b) $\mathcal{L}_{ed}$ (Eq. \ref{['eq:le']}) aims to learn the embedding in $\mathcal{S}$ of all old data, bringing it closer to ${\bm{f}}_{{\bm{y}}}$, while (c) $\mathcal{L}_{aed}$ (Eq. \ref{['eq:laed']}) guides the generator $\mathcal{G}$ to produce new data that is distant from ${\bm{f}}_{{\bm{y}}}$, thus enhancing the model's diversity.
  • Figure 5: The accuracy at data ratios from 10% to 100% is shown for the teacher (ResNet34) and student (ResNet18) models.
  • ...and 2 more figures