EVC: Towards Real-Time Neural Image Compression with Mask Decay
Guo-Hua Wang, Jiahao Li, Bin Li, Yan Lu
TL;DR
The paper tackles the practicality gap in neural image compression by proposing EVC, a single-model, real-time codec capable of variable RD with adjustable quantization steps, maintaining competitive RD performance while delivering significant speedups.The core innovations are mask decay, which converts a large teacher model into smaller student encoders, and a novel sparsity loss that overcomes limitations of traditional L1/L2 pruning losses.A scalable encoder is introduced via residual representation learning (RRL), enabling multiple encoders to share one decoder and progressively bridge the gap from the teacher while maintaining efficiency.Empirical results demonstrate 30 FPS performance for 768×512 inputs (and 1080p for the small model), BD-rate improvements over traditional VVC, and superiority to prior scalable approaches like SlimCAE, highlighting practical impact for real-time neural image compression.
Abstract
Neural image compression has surpassed state-of-the-art traditional codecs (H.266/VVC) for rate-distortion (RD) performance, but suffers from large complexity and separate models for different rate-distortion trade-offs. In this paper, we propose an Efficient single-model Variable-bit-rate Codec (EVC), which is able to run at 30 FPS with 768x512 input images and still outperforms VVC for the RD performance. By further reducing both encoder and decoder complexities, our small model even achieves 30 FPS with 1920x1080 input images. To bridge the performance gap between our different capacities models, we meticulously design the mask decay, which transforms the large model's parameters into the small model automatically. And a novel sparsity regularization loss is proposed to mitigate shortcomings of $L_p$ regularization. Our algorithm significantly narrows the performance gap by 50% and 30% for our medium and small models, respectively. At last, we advocate the scalable encoder for neural image compression. The encoding complexity is dynamic to meet different latency requirements. We propose decaying the large encoder multiple times to reduce the residual representation progressively. Both mask decay and residual representation learning greatly improve the RD performance of our scalable encoder. Our code is at https://github.com/microsoft/DCVC.
