Rethinking Learned Image Compression: Context is All You Need

Jixiang Luo

Rethinking Learned Image Compression: Context is All You Need

Jixiang Luo

TL;DR

The paper investigates the boundary of learned image compression (LIC) under PSNR-based rate-distortion by dissecting how scaling encoder, decoder, context, and dataset affect performance. It finds that gains mainly come from expanding the context model and, to some extent, the decoder, with overfitting on the Kodak dataset acting as a powerful context signal that yields substantial BD-RATE improvements over VVC. Introducing a ChannelAttenBlock and adaptive quantization, the authors show that context-driven RD improvements dominate, though excessive context capacity can hurt PSNR, revealing a nuanced trade-off along the RD frontier. The work highlights data-driven context modeling as the key lever for LIC gains, while also emphasizing generalization risks when relying on overfitted, dataset-specific context. Practically, this suggests directions for constructing more scalable LIC systems that balance context richness, dataset breadth, and entropy modeling to approach the LIC boundary more robustly.

Abstract

Since LIC has made rapid progress recently compared to traditional methods, this paper attempts to discuss the question about 'Where is the boundary of Learned Image Compression(LIC)?'. Thus this paper splits the above problem into two sub-problems:1)Where is the boundary of rate-distortion performance of PSNR? 2)How to further improve the compression gain and achieve the boundary? Therefore this paper analyzes the effectiveness of scaling parameters for encoder, decoder and context model, which are the three components of LIC. Then we conclude that scaling for LIC is to scale for context model and decoder within LIC. Extensive experiments demonstrate that overfitting can actually serve as an effective context. By optimizing the context, this paper further improves PSNR and achieves state-of-the-art performance, showing a performance gain of 14.39% with BD-RATE over VVC.

Rethinking Learned Image Compression: Context is All You Need

TL;DR

Abstract

Paper Structure (14 sections, 7 equations, 5 figures, 4 tables)

This paper contains 14 sections, 7 equations, 5 figures, 4 tables.

Introduction
Problem Formatting
Motivation
Methods
Scaling for Encoder
Scaling for Decoder
Scaling for Context
Scaling for Dataset
Experiment
Datasets
Settings
Results
Ablation experiments
Discussion

Figures (5)

Figure 1: Channel attention for context model. $1\times1, 5\times5$ are the kernel size of convolutional layer, ReLU and softmax are the activation function. $c_{in}$ in the number of input channel
Figure 2: PSNR of Kodak dataset."Overfitting" means models trained at Kodak dataset.
Figure 3: left:PSNR of CLIC Professional dataset. right:PSNR of Technick dataset.
Figure 4: The rate-distortion performance with channel number of context model .
Figure 5: left:The influence of training dataset. right:The influence of adaptive quantization.

Rethinking Learned Image Compression: Context is All You Need

TL;DR

Abstract

Rethinking Learned Image Compression: Context is All You Need

Authors

TL;DR

Abstract

Table of Contents

Figures (5)