Table of Contents
Fetching ...

Relation-Guided Adversarial Learning for Data-free Knowledge Transfer

Yingping Liang, Ying Fu

TL;DR

This work tackles the data-free knowledge distillation challenge by addressing data homogeneity through Relation-Guided Adversarial Learning (RGAL). RGAL uses triplet-based, relation-aware losses in two alternating phases—image synthesis and student training—to simultaneously promote intra-class diversity and inter-class confusion, aided by a focal weighted sampling strategy. Empirical results across data-free KD, data-free quantization, and non-exemplar incremental learning show consistent accuracy gains and improved data efficiency, including large-scale ImageNet experiments. The approach demonstrates strong generalization and practical impact for privacy-preserving model compression and robust knowledge transfer without real data.

Abstract

Data-free knowledge distillation transfers knowledge by recovering training data from a pre-trained model. Despite the recent success of seeking global data diversity, the diversity within each class and the similarity among different classes are largely overlooked, resulting in data homogeneity and limited performance. In this paper, we introduce a novel Relation-Guided Adversarial Learning method with triplet losses, which solves the homogeneity problem from two aspects. To be specific, our method aims to promote both intra-class diversity and inter-class confusion of the generated samples. To this end, we design two phases, an image synthesis phase and a student training phase. In the image synthesis phase, we construct an optimization process to push away samples with the same labels and pull close samples with different labels, leading to intra-class diversity and inter-class confusion, respectively. Then, in the student training phase, we perform an opposite optimization, which adversarially attempts to reduce the distance of samples of the same classes and enlarge the distance of samples of different classes. To mitigate the conflict of seeking high global diversity and keeping inter-class confusing, we propose a focal weighted sampling strategy by selecting the negative in the triplets unevenly within a finite range of distance. RGAL shows significant improvement over previous state-of-the-art methods in accuracy and data efficiency. Besides, RGAL can be inserted into state-of-the-art methods on various data-free knowledge transfer applications. Experiments on various benchmarks demonstrate the effectiveness and generalizability of our proposed method on various tasks, specially data-free knowledge distillation, data-free quantization, and non-exemplar incremental learning. Our code is available at https://github.com/Sharpiless/RGAL.

Relation-Guided Adversarial Learning for Data-free Knowledge Transfer

TL;DR

This work tackles the data-free knowledge distillation challenge by addressing data homogeneity through Relation-Guided Adversarial Learning (RGAL). RGAL uses triplet-based, relation-aware losses in two alternating phases—image synthesis and student training—to simultaneously promote intra-class diversity and inter-class confusion, aided by a focal weighted sampling strategy. Empirical results across data-free KD, data-free quantization, and non-exemplar incremental learning show consistent accuracy gains and improved data efficiency, including large-scale ImageNet experiments. The approach demonstrates strong generalization and practical impact for privacy-preserving model compression and robust knowledge transfer without real data.

Abstract

Data-free knowledge distillation transfers knowledge by recovering training data from a pre-trained model. Despite the recent success of seeking global data diversity, the diversity within each class and the similarity among different classes are largely overlooked, resulting in data homogeneity and limited performance. In this paper, we introduce a novel Relation-Guided Adversarial Learning method with triplet losses, which solves the homogeneity problem from two aspects. To be specific, our method aims to promote both intra-class diversity and inter-class confusion of the generated samples. To this end, we design two phases, an image synthesis phase and a student training phase. In the image synthesis phase, we construct an optimization process to push away samples with the same labels and pull close samples with different labels, leading to intra-class diversity and inter-class confusion, respectively. Then, in the student training phase, we perform an opposite optimization, which adversarially attempts to reduce the distance of samples of the same classes and enlarge the distance of samples of different classes. To mitigate the conflict of seeking high global diversity and keeping inter-class confusing, we propose a focal weighted sampling strategy by selecting the negative in the triplets unevenly within a finite range of distance. RGAL shows significant improvement over previous state-of-the-art methods in accuracy and data efficiency. Besides, RGAL can be inserted into state-of-the-art methods on various data-free knowledge transfer applications. Experiments on various benchmarks demonstrate the effectiveness and generalizability of our proposed method on various tasks, specially data-free knowledge distillation, data-free quantization, and non-exemplar incremental learning. Our code is available at https://github.com/Sharpiless/RGAL.

Paper Structure

This paper contains 17 sections, 18 equations, 11 figures, 17 tables, 1 algorithm.

Figures (11)

  • Figure 1: Illustration of sample optimization in different data generation methods. Red arrow indicates pushing away while green indicates pulling close. Both adversarial based and contrastive methods ignore the relation between individual samples, resulting in limited intra-class diversity and inter-class confusion. Our method aims to deal with the distances among samples, leading to both high intra-class diversity and meanwhile maintaining inter-class confusion.
  • Figure 2: The framework of our proposed RGAL method for data-free knowledge distillation. A, P, and N represent the anchor sample, positive sample, and negative sample, respectively. Our method alternates training the generator and the student model using triplet losses in opposite directions. Triplets are sampled with the distance weighted sampling strategy when training the student, and with the focal weighted sampling strategy when training the generator. The focal weighted sampling tends to the negative neither too far nor too close, thus reserving global data diversity. Then, the embeddings of the student are optimized in opposite directions in terms of the triplets extracted by the distance weighted sampling.
  • Figure 3: Illustration of different sampling strategies and optimization directions of the positive and the negative. Red denotes pushing away and green denotes pulling close, where color opacity denotes sampling probability.
  • Figure 4: Visualization of samples in the embedding space on a typical three-class problem from ZSKT Micaelli2019ZeroShotKT and our methods. Pseudo points are randomly initialized away from the data manifold. The first line shows the result of ZSKT, in which the proposed adversarial formulation is widely adopted by the recent state-of-the-art methods. Both RGAL with adversarial triplet loss eliminate dense sample clusters and samples are more widely distributed.
  • Figure 5: A batch of generated samples for knowledge transfer on CIFAR-10 from a trained WRN40-2 model. Compared with the others, the samples from RGAL show high diversity and stronger inter-class confusion among the same batch. We compare our proposed RGAL with the state-of-the-arts.
  • ...and 6 more figures