Table of Contents
Fetching ...

Adapting Learned Image Codecs to Screen Content via Adjustable Transformations

H. Burak Dogaroglu, A. Burakhan Koyuncu, Atanas Boev, Elena Alshina, Eckehard Steinbach

TL;DR

Learned image codecs struggle with screen content due to distribution shifts. The authors wrap a fixed codec with parameterized, invertible linear transforms and two neural modules (pre-filter and post-filter) to adapt LICs to screen content without retraining the codec, achieving end-to-end learning. By focusing on a desaturation-based forward transform plus CR/RS modules, they report up to $10\%$ BD-Rate savings on SC data with only about $1\%$ extra parameters, across multiple baseline LICs; results emphasize compatibility, modest compute overhead, and broad applicability. This approach provides a practical pathway to extend LIC performance to specialized domains and invites future work on domain-specific transforms and per-image coefficient optimization.

Abstract

As learned image codecs (LICs) become more prevalent, their low coding efficiency for out-of-distribution data becomes a bottleneck for some applications. To improve the performance of LICs for screen content (SC) images without breaking backwards compatibility, we propose to introduce parameterized and invertible linear transformations into the coding pipeline without changing the underlying baseline codec's operation flow. We design two neural networks to act as prefilters and postfilters in our setup to increase the coding efficiency and help with the recovery from coding artifacts. Our end-to-end trained solution achieves up to 10% bitrate savings on SC compression compared to the baseline LICs while introducing only 1% extra parameters.

Adapting Learned Image Codecs to Screen Content via Adjustable Transformations

TL;DR

Learned image codecs struggle with screen content due to distribution shifts. The authors wrap a fixed codec with parameterized, invertible linear transforms and two neural modules (pre-filter and post-filter) to adapt LICs to screen content without retraining the codec, achieving end-to-end learning. By focusing on a desaturation-based forward transform plus CR/RS modules, they report up to BD-Rate savings on SC data with only about extra parameters, across multiple baseline LICs; results emphasize compatibility, modest compute overhead, and broad applicability. This approach provides a practical pathway to extend LIC performance to specialized domains and invites future work on domain-specific transforms and per-image coefficient optimization.

Abstract

As learned image codecs (LICs) become more prevalent, their low coding efficiency for out-of-distribution data becomes a bottleneck for some applications. To improve the performance of LICs for screen content (SC) images without breaking backwards compatibility, we propose to introduce parameterized and invertible linear transformations into the coding pipeline without changing the underlying baseline codec's operation flow. We design two neural networks to act as prefilters and postfilters in our setup to increase the coding efficiency and help with the recovery from coding artifacts. Our end-to-end trained solution achieves up to 10% bitrate savings on SC compression compared to the baseline LICs while introducing only 1% extra parameters.
Paper Structure (18 sections, 6 figures, 2 tables)

This paper contains 18 sections, 6 figures, 2 tables.

Figures (6)

  • Figure 1: Architecture of the residual network based on MBConv layers. This implementation depicts $L = 8, C=32$ setup.
  • Figure 2: Operational diagram of the proposed pipeline.
  • Figure 3: Compressibility of desaturation, PCA downsampling and PCA quantization. Lower bitrate indicates a potential for performance gain. While desaturation clearly allows us to compress images into smaller files, the PCA downsampling and PCA quantization transformations have no visible effect on the bitrate difference.
  • Figure 4: Rate-Distortion plot of different desaturation levels on top of Ballé baseline codec. The reconstructed image quality drops higher for stronger transformations. However, weaker transformations produce almost little to none bitrate gains, indicating a sweet spot for the quality - bitrate tradeoff.
  • Figure 5: Rate-Distortion plot for our solution with baselines as Ballé, Minnen and Cheng codecs.
  • ...and 1 more figures