FEDS: Feature and Entropy-Based Distillation Strategy for Efficient Learned Image Compression
Haisheng Fu, Jie Liang, Zhenman Fang, Jingning Han
TL;DR
This work tackles the practical burden of learned image compression by introducing FEDS, a feature and entropy-based distillation framework that transfers knowledge from a high-capacity Swin-V2–augmented teacher to a compact student. It combines feature alignment with an entropy-driven selection of latent channels, implemented within a three-phase training regime to preserve performance while dramatically reducing parameters and speeding encoding/decoding. Empirical results across Kodak, Tecnick, and CLIC demonstrate that the student nearly matches the teacher with only a small BD-Rate gap, while achieving roughly a 2.9× speedup and a ~63% reduction in parameters, making LIC more viable for real-time and resource-constrained settings. The approach is shown to generalize to transformer-based architectures and is supported by an information-theoretic justification linking mutual information, entropy, and KL-based feature transfer.
Abstract
Learned image compression (LIC) methods have recently outperformed traditional codecs such as VVC in rate-distortion performance. However, their large models and high computational costs have limited their practical adoption. In this paper, we first construct a high-capacity teacher model by integrating Swin-Transformer V2-based attention modules, additional residual blocks, and expanded latent channels, thus achieving enhanced compression performance. Building on this foundation, we propose a \underline{F}eature and \underline{E}ntropy-based \underline{D}istillation \underline{S}trategy (\textbf{FEDS}) that transfers key knowledge from the teacher to a lightweight student model. Specifically, we align intermediate feature representations and emphasize the most informative latent channels through an entropy-based loss. A staged training scheme refines this transfer in three phases: feature alignment, channel-level distillation, and final fine-tuning. Our student model nearly matches the teacher across Kodak (1.24\% BD-Rate increase), Tecnick (1.17\%), and CLIC (0.55\%) while cutting parameters by about 63\% and accelerating encoding/decoding by around 73\%. Moreover, ablation studies indicate that FEDS generalizes effectively to transformer-based networks. The experimental results demonstrate our approach strikes a compelling balance among compression performance, speed, and model parameters, making it well-suited for real-time or resource-limited scenarios.
