Table of Contents
Fetching ...

Conditional Pseudo-Supervised Contrast for Data-Free Knowledge Distillation

Renrong Shao, Wei Zhang, Jun wang

TL;DR

This work tackles data-free knowledge distillation (DFKD) for privacy-preserving model compression and proposes Conditional Pseudo-Supervised Contrast for DFKD (CPSC-DFKD). It introduces a category-conditioned generator with categorical feature embeddings (CFE) and a pseudo-supervised contrastive objective to produce category-specific, diverse samples, while combining an IKD-based distillation loss with a conditional cross-entropy and a teacher-student contrastive loss $L_{SCL}$. The generator and student are optimized with composite losses, notably $L_{IKD}$, $L_{CE}$, $L_{SCL}$, and $L_{BN}$, enabling category-aware distribution alignment and diversity. Experiments on CIFAR-10/100 and Tiny-ImageNet show consistent improvements over prior DFKD methods, highlighting enhanced sample diversity and distillation efficacy, with practical implications for privacy-preserving deployment and long-tail learning.

Abstract

Data-free knowledge distillation~(DFKD) is an effective manner to solve model compression and transmission restrictions while retaining privacy protection, which has attracted extensive attention in recent years. Currently, the majority of existing methods utilize a generator to synthesize images to support the distillation. Although the current methods have achieved great success, there are still many issues to be explored. Firstly, the outstanding performance of supervised learning in deep learning drives us to explore a pseudo-supervised paradigm on DFKD. Secondly, current synthesized methods cannot distinguish the distributions of different categories of samples, thus producing ambiguous samples that may lead to an incorrect evaluation by the teacher. Besides, current methods cannot optimize the category-wise diversity samples, which will hinder the student model learning from diverse samples and further achieving better performance. In this paper, to address the above limitations, we propose a novel learning paradigm, i.e., conditional pseudo-supervised contrast for data-free knowledge distillation~(CPSC-DFKD). The primary innovations of CPSC-DFKD are: (1) introducing a conditional generative adversarial network to synthesize category-specific diverse images for pseudo-supervised learning, (2) improving the modules of the generator to distinguish the distributions of different categories, and (3) proposing pseudo-supervised contrastive learning based on teacher and student views to enhance diversity. Comprehensive experiments on three commonly-used datasets validate the performance lift of both the student and generator brought by CPSC-DFKD. The code is available at https://github.com/RoryShao/CPSC-DFKD.git

Conditional Pseudo-Supervised Contrast for Data-Free Knowledge Distillation

TL;DR

This work tackles data-free knowledge distillation (DFKD) for privacy-preserving model compression and proposes Conditional Pseudo-Supervised Contrast for DFKD (CPSC-DFKD). It introduces a category-conditioned generator with categorical feature embeddings (CFE) and a pseudo-supervised contrastive objective to produce category-specific, diverse samples, while combining an IKD-based distillation loss with a conditional cross-entropy and a teacher-student contrastive loss . The generator and student are optimized with composite losses, notably , , , and , enabling category-aware distribution alignment and diversity. Experiments on CIFAR-10/100 and Tiny-ImageNet show consistent improvements over prior DFKD methods, highlighting enhanced sample diversity and distillation efficacy, with practical implications for privacy-preserving deployment and long-tail learning.

Abstract

Data-free knowledge distillation~(DFKD) is an effective manner to solve model compression and transmission restrictions while retaining privacy protection, which has attracted extensive attention in recent years. Currently, the majority of existing methods utilize a generator to synthesize images to support the distillation. Although the current methods have achieved great success, there are still many issues to be explored. Firstly, the outstanding performance of supervised learning in deep learning drives us to explore a pseudo-supervised paradigm on DFKD. Secondly, current synthesized methods cannot distinguish the distributions of different categories of samples, thus producing ambiguous samples that may lead to an incorrect evaluation by the teacher. Besides, current methods cannot optimize the category-wise diversity samples, which will hinder the student model learning from diverse samples and further achieving better performance. In this paper, to address the above limitations, we propose a novel learning paradigm, i.e., conditional pseudo-supervised contrast for data-free knowledge distillation~(CPSC-DFKD). The primary innovations of CPSC-DFKD are: (1) introducing a conditional generative adversarial network to synthesize category-specific diverse images for pseudo-supervised learning, (2) improving the modules of the generator to distinguish the distributions of different categories, and (3) proposing pseudo-supervised contrastive learning based on teacher and student views to enhance diversity. Comprehensive experiments on three commonly-used datasets validate the performance lift of both the student and generator brought by CPSC-DFKD. The code is available at https://github.com/RoryShao/CPSC-DFKD.git

Paper Structure

This paper contains 28 sections, 11 equations, 5 figures, 7 tables, 1 algorithm.

Figures (5)

  • Figure 1: Conceptual diagram of different distillation approaches. (a) Knowledge distillation by human labels. (b) Previous data-free distillation approaches with the generator. (c) Our proposed data-free approach by condition generator.
  • Figure 2: The overall workflow of CPSC-DFKD. A condition generator with CFE layers is exploited to synthesize images with labels for teacher and student by adversarial distillation. The CFE layer maps the category with features by rebuilding embedding layers based on BN layers. In the penultimate layer, we map the feature representation to a new space and compare the discrepancy under the supervision of category labels. Besides, distilling knowledge from teacher to student in the output layer.
  • Figure 3: Effect of $\alpha$, $\beta$, $\gamma$ and $\eta$ on the CIFAR-100 Dataset.
  • Figure 4: Visualization of synthetic category-specific diverse images from WRN40-2 to WRN16-1 by different approaches on CIFAR-10.
  • Figure 5: Visualization for synthetic samples from WRN-40-2 to WRN-16-1 by different approaches on CIFAR-100.