Table of Contents
Fetching ...

SDAKD: Student Discriminator Assisted Knowledge Distillation for Super-Resolution Generative Adversarial Networks

Nikolaos Kaparinos, Vasileios Mezaris

TL;DR

This work tackles the challenge of compressing high-quality super-resolution GANs for deployment on devices with limited resources. It introduces SDAKD, which adds a student discriminator and a three-stage training pipeline that combines supervised learning, feature-map distillation, and adversarial training to mitigate capacity mismatch between the teacher discriminator and a smaller student generator. The method includes adapting MLP-based feature-map distillation to networks with reduced channels and training the student networks through staged supervision before full adversarial training. Experiments on GCFSR and Real-ESRGAN show SDAKD consistently surpasses state-of-the-art GAN knowledge distillation approaches in FID across multiple upscaling factors, with ablations confirming the critical role of the student discriminator and the staged training strategy. The approach enables significant inference-speedups while preserving perceptual quality, and the authors plan to release the code publicly.

Abstract

Generative Adversarial Networks (GANs) achieve excellent performance in generative tasks, such as image super-resolution, but their computational requirements make difficult their deployment on resource-constrained devices. While knowledge distillation is a promising research direction for GAN compression, effectively training a smaller student generator is challenging due to the capacity mismatch between the student generator and the teacher discriminator. In this work, we propose Student Discriminator Assisted Knowledge Distillation (SDAKD), a novel GAN distillation methodology that introduces a student discriminator to mitigate this capacity mismatch. SDAKD follows a three-stage training strategy, and integrates an adapted feature map distillation approach in its last two training stages. We evaluated SDAKD on two well-performing super-resolution GANs, GCFSR and Real-ESRGAN. Our experiments demonstrate consistent improvements over the baselines and SOTA GAN knowledge distillation methods. The SDAKD source code will be made openly available upon acceptance of the paper.

SDAKD: Student Discriminator Assisted Knowledge Distillation for Super-Resolution Generative Adversarial Networks

TL;DR

This work tackles the challenge of compressing high-quality super-resolution GANs for deployment on devices with limited resources. It introduces SDAKD, which adds a student discriminator and a three-stage training pipeline that combines supervised learning, feature-map distillation, and adversarial training to mitigate capacity mismatch between the teacher discriminator and a smaller student generator. The method includes adapting MLP-based feature-map distillation to networks with reduced channels and training the student networks through staged supervision before full adversarial training. Experiments on GCFSR and Real-ESRGAN show SDAKD consistently surpasses state-of-the-art GAN knowledge distillation approaches in FID across multiple upscaling factors, with ablations confirming the critical role of the student discriminator and the staged training strategy. The approach enables significant inference-speedups while preserving perceptual quality, and the authors plan to release the code publicly.

Abstract

Generative Adversarial Networks (GANs) achieve excellent performance in generative tasks, such as image super-resolution, but their computational requirements make difficult their deployment on resource-constrained devices. While knowledge distillation is a promising research direction for GAN compression, effectively training a smaller student generator is challenging due to the capacity mismatch between the student generator and the teacher discriminator. In this work, we propose Student Discriminator Assisted Knowledge Distillation (SDAKD), a novel GAN distillation methodology that introduces a student discriminator to mitigate this capacity mismatch. SDAKD follows a three-stage training strategy, and integrates an adapted feature map distillation approach in its last two training stages. We evaluated SDAKD on two well-performing super-resolution GANs, GCFSR and Real-ESRGAN. Our experiments demonstrate consistent improvements over the baselines and SOTA GAN knowledge distillation methods. The SDAKD source code will be made openly available upon acceptance of the paper.

Paper Structure

This paper contains 16 sections, 4 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Overview of SDAKD, our proposed GAN knowledge distillation methodology.
  • Figure 2: Comparison of the DCD versus modified DCD output distribution of the discriminator that is given as input an image generated by the student generator.
  • Figure 3: Output distributions of the discriminator given a generated image as input, using the pre-trained teacher versus the student discriminator, for GCFSR (top) / Real-ESRGAN (bottom).
  • Figure 4: Qualitative results of 32$\times$ upsampling using our proposed SDAKD methodology on the GCFSR network. The SDAKD student models have 1/2, 1/4 and 1/8 the number of channels of the original generator network. Results are compared against the ground truth and the original GCFSR model, which was also used as teacher for the knowledge distillation. Samples taken from our test set, derived from the CelebA-HQ dataset. The input image size is 32×32 pixels, while the output size is 1024×1024 pixels.