Table of Contents
Fetching ...

Towards Effective Data-Free Knowledge Distillation via Diverse Diffusion Augmentation

Muquan Li, Dongyang Zhang, Tao He, Xiurui Xie, Yuan-Fang Li, Ke Qin

TL;DR

This paper revise the paradigm of common data synthesis in DFKD to a composite process through leveraging diffusion models subsequent to data synthesis for self-supervised augmentation, which generates a spectrum of data samples with similar distributions while retaining controlled variations.

Abstract

Data-free knowledge distillation (DFKD) has emerged as a pivotal technique in the domain of model compression, substantially reducing the dependency on the original training data. Nonetheless, conventional DFKD methods that employ synthesized training data are prone to the limitations of inadequate diversity and discrepancies in distribution between the synthesized and original datasets. To address these challenges, this paper introduces an innovative approach to DFKD through diverse diffusion augmentation (DDA). Specifically, we revise the paradigm of common data synthesis in DFKD to a composite process through leveraging diffusion models subsequent to data synthesis for self-supervised augmentation, which generates a spectrum of data samples with similar distributions while retaining controlled variations. Furthermore, to mitigate excessive deviation in the embedding space, we introduce an image filtering technique grounded in cosine similarity to maintain fidelity during the knowledge distillation process. Comprehensive experiments conducted on CIFAR-10, CIFAR-100, and Tiny-ImageNet datasets showcase the superior performance of our method across various teacher-student network configurations, outperforming the contemporary state-of-the-art DFKD methods. Code will be available at:https://github.com/SLGSP/DDA.

Towards Effective Data-Free Knowledge Distillation via Diverse Diffusion Augmentation

TL;DR

This paper revise the paradigm of common data synthesis in DFKD to a composite process through leveraging diffusion models subsequent to data synthesis for self-supervised augmentation, which generates a spectrum of data samples with similar distributions while retaining controlled variations.

Abstract

Data-free knowledge distillation (DFKD) has emerged as a pivotal technique in the domain of model compression, substantially reducing the dependency on the original training data. Nonetheless, conventional DFKD methods that employ synthesized training data are prone to the limitations of inadequate diversity and discrepancies in distribution between the synthesized and original datasets. To address these challenges, this paper introduces an innovative approach to DFKD through diverse diffusion augmentation (DDA). Specifically, we revise the paradigm of common data synthesis in DFKD to a composite process through leveraging diffusion models subsequent to data synthesis for self-supervised augmentation, which generates a spectrum of data samples with similar distributions while retaining controlled variations. Furthermore, to mitigate excessive deviation in the embedding space, we introduce an image filtering technique grounded in cosine similarity to maintain fidelity during the knowledge distillation process. Comprehensive experiments conducted on CIFAR-10, CIFAR-100, and Tiny-ImageNet datasets showcase the superior performance of our method across various teacher-student network configurations, outperforming the contemporary state-of-the-art DFKD methods. Code will be available at:https://github.com/SLGSP/DDA.

Paper Structure

This paper contains 25 sections, 10 equations, 8 figures, 4 tables, 1 algorithm.

Figures (8)

  • Figure 1: Comparison of the traditional DFKD with our method in terms of the overall framework.
  • Figure 2: The visualization of the synthesized data employed in the knowledge distillation training process for the pre-trained wrn-40-2 model on CIFAR-10. Three representative DFKD methods, ADI DBLP:conf/cvpr/YinMALMHJK20, CMI DBLP:conf/ijcai/FangSWSWS21 and SpaceshipNet DBLP:conf/cvpr/YuC0J23 are chosen to compared with our method. Obviously, our DDA is capable of achieving stronger instance distinguishability.
  • Figure 3: The illustrative framework of the proposed diverse diffusion augmentation (DDA) DFKD method. The three steps we present in the overall DFKD are arranged from left to right.
  • Figure 4: The visualization of diffusion augmentation and image filtering process. The diverse augmented and filtered low-quality images of several original images are shown.
  • Figure 5: The influence of the cosine similarity on two teacher-student networks, wrn-40-2 to wrn-16-1 and resnet-34 to resnet-18. The positive correlation tendencies demonstrate the positive effect of cosine similarity on the results.
  • ...and 3 more figures