EVC: Towards Real-Time Neural Image Compression with Mask Decay

Guo-Hua Wang; Jiahao Li; Bin Li; Yan Lu

EVC: Towards Real-Time Neural Image Compression with Mask Decay

Guo-Hua Wang, Jiahao Li, Bin Li, Yan Lu

TL;DR

The paper tackles the practicality gap in neural image compression by proposing EVC, a single-model, real-time codec capable of variable RD with adjustable quantization steps, maintaining competitive RD performance while delivering significant speedups.The core innovations are mask decay, which converts a large teacher model into smaller student encoders, and a novel sparsity loss that overcomes limitations of traditional L1/L2 pruning losses.A scalable encoder is introduced via residual representation learning (RRL), enabling multiple encoders to share one decoder and progressively bridge the gap from the teacher while maintaining efficiency.Empirical results demonstrate 30 FPS performance for 768×512 inputs (and 1080p for the small model), BD-rate improvements over traditional VVC, and superiority to prior scalable approaches like SlimCAE, highlighting practical impact for real-time neural image compression.

Abstract

Neural image compression has surpassed state-of-the-art traditional codecs (H.266/VVC) for rate-distortion (RD) performance, but suffers from large complexity and separate models for different rate-distortion trade-offs. In this paper, we propose an Efficient single-model Variable-bit-rate Codec (EVC), which is able to run at 30 FPS with 768x512 input images and still outperforms VVC for the RD performance. By further reducing both encoder and decoder complexities, our small model even achieves 30 FPS with 1920x1080 input images. To bridge the performance gap between our different capacities models, we meticulously design the mask decay, which transforms the large model's parameters into the small model automatically. And a novel sparsity regularization loss is proposed to mitigate shortcomings of $L_p$ regularization. Our algorithm significantly narrows the performance gap by 50% and 30% for our medium and small models, respectively. At last, we advocate the scalable encoder for neural image compression. The encoding complexity is dynamic to meet different latency requirements. We propose decaying the large encoder multiple times to reduce the residual representation progressively. Both mask decay and residual representation learning greatly improve the RD performance of our scalable encoder. Our code is at https://github.com/microsoft/DCVC.

EVC: Towards Real-Time Neural Image Compression with Mask Decay

TL;DR

Abstract

regularization. Our algorithm significantly narrows the performance gap by 50% and 30% for our medium and small models, respectively. At last, we advocate the scalable encoder for neural image compression. The encoding complexity is dynamic to meet different latency requirements. We propose decaying the large encoder multiple times to reduce the residual representation progressively. Both mask decay and residual representation learning greatly improve the RD performance of our scalable encoder. Our code is at https://github.com/microsoft/DCVC.

Paper Structure (19 sections, 8 equations, 20 figures, 10 tables)

This paper contains 19 sections, 8 equations, 20 figures, 10 tables.

Introduction
Related works
Methodology
Our EVC for image compression
Improve the student by mask decay
The scalable encoder
Experiments
Comparison with state-of-the-art
Mask decay
The scalable encoder
Ablation studies
Conclusions and future works
The framework of EVC
Mask decay
The scalable encoder
...and 4 more sections

Figures (20)

Figure 1: The trade-off between BD-Rate and complexities on Kodak. The anchor is VTM. (\ref{['fig:md_single']}) and (\ref{['fig:scalable']}) show the performance improvement by our mask decay and residual representation learning (RRL). We cite results from SwinT-Hyperprior SwinT for comparison.
Figure 2: The overall framework of EVC.
Figure 3: The architectures of our encoder and decoder. All mask layers will be merged into conv. layers after training. For simplification, we omit Leaky ReLUs within the Depth-Conv block.
Figure 4: (\ref{['fig:mask-arch']}) presents our pruning method. The mask layer is inserted between two conv. layers. These two cumbersome conv. layers will be transformed into efficient ones by merging the mask layer into them. (\ref{['fig:mask-loss']}) compares the gradient $\frac{\partial \mathcal{L}_{sparse}}{\partial x}$ of different sparsity loss functions.
Figure 5: Illustration of compressing encoders multi-times to learn residual representations progressively. "Enc", "Dec", and "E" denote the encoder, the decoder, and the entropy module, respectively.
...and 15 more figures

Theorems & Definitions (2)

Claim 1
proof

EVC: Towards Real-Time Neural Image Compression with Mask Decay

TL;DR

Abstract

EVC: Towards Real-Time Neural Image Compression with Mask Decay

Authors

TL;DR

Abstract

Table of Contents

Figures (20)

Theorems & Definitions (2)