FD-LSCIC: Frequency Decomposition-based Learned Screen Content Image Compression
Shiqi Jiang, Hui Yuan, Shuai Li, Huanqiang Zeng, Sam Kwong
TL;DR
This work targets screen content image compression by addressing three SC-specific challenges: compact latent feature learning, per-frequency quantization granularity, and limited large-scale SC data. It introduces FD-LSCIC, a frequency-decomposition LIC framework built on four components—MToRB for multi-frequency feature extraction, CTSFRB for multi-scale fusion, MFCIM for cross-frequency context interaction, and AQ for adaptive per-frequency quantization—and ships a large SDU-SCICD10K dataset (>10k images from PC/mobile). The method employs a VAE-based RD objective with per-frequency entropy models, achieving substantial BD-rate reductions relative to H.266/VVC and state-of-the-art LIC methods on SC datasets, alongside favorable complexity and qualitative results. Ablation studies confirm the contribution of each module, demonstrating that true multi-frequency processing and adaptive quantization materially improve SC compression performance and efficiency, with practical implications for SC-intensive applications.
Abstract
The learned image compression (LIC) methods have already surpassed traditional techniques in compressing natural scene (NS) images. However, directly applying these methods to screen content (SC) images, which possess distinct characteristics such as sharp edges, repetitive patterns, embedded text and graphics, yields suboptimal results. This paper addresses three key challenges in SC image compression: learning compact latent features, adapting quantization step sizes, and the lack of large SC datasets. To overcome these challenges, we propose a novel compression method that employs a multi-frequency two-stage octave residual block (MToRB) for feature extraction, a cascaded triple-scale feature fusion residual block (CTSFRB) for multi-scale feature integration and a multi-frequency context interaction module (MFCIM) to reduce inter-frequency correlations. Additionally, we introduce an adaptive quantization module that learns scaled uniform noise for each frequency component, enabling flexible control over quantization granularity. Furthermore, we construct a large SC image compression dataset (SDU-SCICD10K), which includes over 10,000 images spanning basic SC images, computer-rendered images, and mixed NS and SC images from both PC and mobile platforms. Experimental results demonstrate that our approach significantly improves SC image compression performance, outperforming traditional standards and state-of-the-art learning-based methods in terms of peak signal-to-noise ratio (PSNR) and multi-scale structural similarity (MS-SSIM).
