OMR-NET: a two-stage octave multi-scale residual network for screen content image compression

Shiqi Jiang; Ting Ren; Congrui Fu; Shuai Li; Hui Yuan

OMR-NET: a two-stage octave multi-scale residual network for screen content image compression

Shiqi Jiang, Ting Ren, Congrui Fu, Shuai Li, Hui Yuan

TL;DR

The paper tackles screen content image compression (SCI) by exploiting the distinct frequency characteristics and repetitive patterns of SC images. It introduces an SC-specific LIC framework built on an improved two-stage octave residual network (IToRB), cascaded two-stage multi-scale residual blocks (CTMSRB), and a window-based attention module (WAM), together with a dedicated SC dataset SDU-SCICD2K. The model employs a hyperprior-based entropy model and a loss that combines distortion and entropy terms for high- and low-frequency latent representations: $L=\\lambda d(x, \hat{x}) + r_{y^H} + r_{y^L} + r_{z^H} + r_{z^L}$. Experiments show substantial BD-rate reductions compared to prior LIC methods and competitive results versus H.266/VVC-SCC on several SC datasets, with performance varying by dataset distribution. The work provides a practical SC-focused compression framework and releases SDU-SCICD2K to foster further development in learned SCC for screen content.

Abstract

Screen content (SC) differs from natural scene (NS) with unique characteristics such as noise-free, repetitive patterns, and high contrast. Aiming at addressing the inadequacies of current learned image compression (LIC) methods for SC, we propose an improved two-stage octave convolutional residual blocks (IToRB) for high and low-frequency feature extraction and a cascaded two-stage multi-scale residual blocks (CTMSRB) for improved multi-scale learning and nonlinearity in SC. Additionally, we employ a window-based attention module (WAM) to capture pixel correlations, especially for high contrast regions in the image. We also construct a diverse SC image compression dataset (SDU-SCICD2K) for training, including text, charts, graphics, animation, movie, game and mixture of SC images and NS images. Experimental results show our method, more suited for SC than NS data, outperforms existing LIC methods in rate-distortion performance on SC images. The code is publicly available at https://github.com/SunshineSki/OMR Net.git.

OMR-NET: a two-stage octave multi-scale residual network for screen content image compression

TL;DR

. Experiments show substantial BD-rate reductions compared to prior LIC methods and competitive results versus H.266/VVC-SCC on several SC datasets, with performance varying by dataset distribution. The work provides a practical SC-focused compression framework and releases SDU-SCICD2K to foster further development in learned SCC for screen content.

Abstract

Paper Structure (15 sections, 3 equations, 7 figures, 2 tables)

This paper contains 15 sections, 3 equations, 7 figures, 2 tables.

Introduction
related work
Traditional screen content image compression
Learned screen content image compression
proposed method
Improved two-stage octave residual block
Cascaded two-stage multi-scale residual blocks
Window-based attention module
Loss function
Experimental Results
Training dataset
Experimental setting
RD performance
Ablation study
Conclusion

Figures (7)

Figure 1: Comparison between (a) NS and (b) SC. Red boxes: characteristics between sharp NS textures and sharp SC textures. Yellow boxes: distinctions between smooth NS with visually similar pixels and smooth SC with identical pixels. Green boxes: differences between similar NS patterns and repetitive SC patterns with identical pixels.
Figure 2: Proposed framework. Q denotes quantization, AE and AD denote arithmetic encoding and decoding, respectively. The decoder mirrors the encoder structure, utilizing transposed convolutions.
Figure 3: The structure of the IToRB. Red arrows indicate high-to-low frequency information transmission, while gray arrows depict the reverse. Dotted arrows represent shortcut with stride. RB denotes residual block, k'm's'n' denotes convolution with a kernel of 'm' and a stride of 'n'. N denotes the number of channels. The structure of $f$, $f_{\uparrow}$ and $f_{\downarrow}$ are redesigned. $f_{\uparrow}$ is a variant of $f_{\downarrow}$ employing transposed convolution.
Figure 4: (a) The multi-scale residual block (MSRB) inli2018multi. (b) The improved MSRB infu2023asymmetric. (c) The proposed cascaded two-stage multi-scale residual blocks (CTMSRB). $R$ denotes ReLU, $L$ denotes Leaky ReLU, and $G$ denotes GDN.
Figure 5: Sample images of the proposed SDU-SCICD2K dataset.
...and 2 more figures

OMR-NET: a two-stage octave multi-scale residual network for screen content image compression

TL;DR

Abstract

OMR-NET: a two-stage octave multi-scale residual network for screen content image compression

Authors

TL;DR

Abstract

Table of Contents

Figures (7)