Table of Contents
Fetching ...

Data-Free Adversarial Distillation

Gongfan Fang, Jie Song, Chengchao Shen, Xinchao Wang, Da Chen, Mingli Song

TL;DR

This work tackles the challenge of distilling knowledge without access to real training data by introducing Data-Free Adversarial Distillation (DFAD). DFAD defines an optimizable upper bound on the teacher-student discrepancy and uses a generator to produce hard samples, with a two-stage adversarial training process (imitation and generation) that yields stable learning and continual hard-sample discovery. The method demonstrates competitive performance with data-driven KD on classification and achieves state-of-the-art results in semantic segmentation among data-free approaches. Overall, DFAD provides a scalable, data-efficient pathway for model compression and knowledge transfer when training data are unavailable.

Abstract

Knowledge Distillation (KD) has made remarkable progress in the last few years and become a popular paradigm for model compression and knowledge transfer. However, almost all existing KD algorithms are data-driven, i.e., relying on a large amount of original training data or alternative data, which is usually unavailable in real-world scenarios. In this paper, we devote ourselves to this challenging problem and propose a novel adversarial distillation mechanism to craft a compact student model without any real-world data. We introduce a model discrepancy to quantificationally measure the difference between student and teacher models and construct an optimizable upper bound. In our work, the student and the teacher jointly act the role of the discriminator to reduce this discrepancy, when a generator adversarially produces some "hard samples" to enlarge it. Extensive experiments demonstrate that the proposed data-free method yields comparable performance to existing data-driven methods. More strikingly, our approach can be directly extended to semantic segmentation, which is more complicated than classification, and our approach achieves state-of-the-art results. Code and pretrained models are available at https://github.com/VainF/Data-Free-Adversarial-Distillation.

Data-Free Adversarial Distillation

TL;DR

This work tackles the challenge of distilling knowledge without access to real training data by introducing Data-Free Adversarial Distillation (DFAD). DFAD defines an optimizable upper bound on the teacher-student discrepancy and uses a generator to produce hard samples, with a two-stage adversarial training process (imitation and generation) that yields stable learning and continual hard-sample discovery. The method demonstrates competitive performance with data-driven KD on classification and achieves state-of-the-art results in semantic segmentation among data-free approaches. Overall, DFAD provides a scalable, data-efficient pathway for model compression and knowledge transfer when training data are unavailable.

Abstract

Knowledge Distillation (KD) has made remarkable progress in the last few years and become a popular paradigm for model compression and knowledge transfer. However, almost all existing KD algorithms are data-driven, i.e., relying on a large amount of original training data or alternative data, which is usually unavailable in real-world scenarios. In this paper, we devote ourselves to this challenging problem and propose a novel adversarial distillation mechanism to craft a compact student model without any real-world data. We introduce a model discrepancy to quantificationally measure the difference between student and teacher models and construct an optimizable upper bound. In our work, the student and the teacher jointly act the role of the discriminator to reduce this discrepancy, when a generator adversarially produces some "hard samples" to enlarge it. Extensive experiments demonstrate that the proposed data-free method yields comparable performance to existing data-driven methods. More strikingly, our approach can be directly extended to semantic segmentation, which is more complicated than classification, and our approach achieves state-of-the-art results. Code and pretrained models are available at https://github.com/VainF/Data-Free-Adversarial-Distillation.

Paper Structure

This paper contains 25 sections, 8 equations, 12 figures, 7 tables, 1 algorithm.

Figures (12)

  • Figure 1: The original training data for pretrained models is usually unavailable to users. In this case, alternative data or synthetic data is used for model compression.
  • Figure 2: Framework of Data-Free Adversarial Distillation. We construct an upper bound for model discrepancy, under hard sample constraint.
  • Figure 3: Generated samples on MNIST, CIFAR10 and CIFAR100. The images in the second row are sampled from real data.
  • Figure 4: The accuracy curve of different loss functions on CIFAR10. MAE achieves the best performance among those loss candidates.
  • Figure 5: Segmentation results on CamVid and NYUv2. All baseline methods in the figure are data-driven and our framework achieves the best performance when the original training data is not available.
  • ...and 7 more figures