Knowledge Distillation with Multi-granularity Mixture of Priors for Image Super-Resolution
Simiao Li, Yun Zhang, Wei Li, Hanting Chen, Wenjia Wang, Bingyi Jing, Shaohui Lin, Jie Hu
TL;DR
This work addresses the challenge of compressing image super-resolution models through knowledge distillation without being tied to a single teacher-student architecture. It introduces MiPKD, a two-granularity KD framework comprising a Feature Prior Mixer and a Block Prior Mixer that fuse teacher and student priors in a shared latent space and via dynamic block level mixing, with a multi-term loss that combines logits, feature, and block distillation signals. Empirical results across CNN and Transformer backbones show MiPKD yields consistent PSNR/SSIM gains over strong baselines, including in compounded depth and width compression scenarios, and ablations highlight the value of separate encoders, the 3D random masking strategy, and the auto-encoder auxiliary loss. The method offers a flexible, architecture-agnostic approach to distilling high quality SR models suitable for deployment on resource constrained devices.
Abstract
Knowledge distillation (KD) is a promising yet challenging model compression technique that transfers rich learning representations from a well-performing but cumbersome teacher model to a compact student model. Previous methods for image super-resolution (SR) mostly compare the feature maps directly or after standardizing the dimensions with basic algebraic operations (e.g. average, dot-product). However, the intrinsic semantic differences among feature maps are overlooked, which are caused by the disparate expressive capacity between the networks. This work presents MiPKD, a multi-granularity mixture of prior KD framework, to facilitate efficient SR model through the feature mixture in a unified latent space and stochastic network block mixture. Extensive experiments demonstrate the effectiveness of the proposed MiPKD method.
