Table of Contents
Fetching ...

COMPASS: High-Efficiency Deep Image Compression with Arbitrary-scale Spatial Scalability

Jongmin Park, Jooyoung Lee, Munchurl Kim

TL;DR

COMPASS tackles the challenge of NN-based image compression for spatially scalable, arbitrary-scale versions by introducing Local Implicit Filter Function (LIFF)–based inter-layer prediction and a shared residual compression backbone across enhancement layers. The method encodes $K+1$ arbitrarily scaled versions using a base layer and multiple enhancement layers with a recursive reconstruction scheme, guided by a combined rate-distortion objective $L = \sum_{k=1}^K R^k + \lambda D^k$. Empirical results show substantial BD-rate gains, up to $-58.33\%$ against SHVC and $-47.17\%$ against the state-of-the-art NN-based scalable codec, while maintaining competitive performance with single-layer coding and using fewer parameters. The work demonstrates practical benefits for one-source-multiple-use (OSMU) scenarios by enabling flexible scaling factors without sacrificing coding efficiency or quality.

Abstract

Recently, neural network (NN)-based image compression studies have actively been made and has shown impressive performance in comparison to traditional methods. However, most of the works have focused on non-scalable image compression (single-layer coding) while spatially scalable image compression has drawn less attention although it has many applications. In this paper, we propose a novel NN-based spatially scalable image compression method, called COMPASS, which supports arbitrary-scale spatial scalability. Our proposed COMPASS has a very flexible structure where the number of layers and their respective scale factors can be arbitrarily determined during inference. To reduce the spatial redundancy between adjacent layers for arbitrary scale factors, our COMPASS adopts an inter-layer arbitrary scale prediction method, called LIFF, based on implicit neural representation. We propose a combined RD loss function to effectively train multiple layers. Experimental results show that our COMPASS achieves BD-rate gain of -58.33% and -47.17% at maximum compared to SHVC and the state-of-the-art NN-based spatially scalable image compression method, respectively, for various combinations of scale factors. Our COMPASS also shows comparable or even better coding efficiency than the single-layer coding for various scale factors.

COMPASS: High-Efficiency Deep Image Compression with Arbitrary-scale Spatial Scalability

TL;DR

COMPASS tackles the challenge of NN-based image compression for spatially scalable, arbitrary-scale versions by introducing Local Implicit Filter Function (LIFF)–based inter-layer prediction and a shared residual compression backbone across enhancement layers. The method encodes arbitrarily scaled versions using a base layer and multiple enhancement layers with a recursive reconstruction scheme, guided by a combined rate-distortion objective . Empirical results show substantial BD-rate gains, up to against SHVC and against the state-of-the-art NN-based scalable codec, while maintaining competitive performance with single-layer coding and using fewer parameters. The work demonstrates practical benefits for one-source-multiple-use (OSMU) scenarios by enabling flexible scaling factors without sacrificing coding efficiency or quality.

Abstract

Recently, neural network (NN)-based image compression studies have actively been made and has shown impressive performance in comparison to traditional methods. However, most of the works have focused on non-scalable image compression (single-layer coding) while spatially scalable image compression has drawn less attention although it has many applications. In this paper, we propose a novel NN-based spatially scalable image compression method, called COMPASS, which supports arbitrary-scale spatial scalability. Our proposed COMPASS has a very flexible structure where the number of layers and their respective scale factors can be arbitrarily determined during inference. To reduce the spatial redundancy between adjacent layers for arbitrary scale factors, our COMPASS adopts an inter-layer arbitrary scale prediction method, called LIFF, based on implicit neural representation. We propose a combined RD loss function to effectively train multiple layers. Experimental results show that our COMPASS achieves BD-rate gain of -58.33% and -47.17% at maximum compared to SHVC and the state-of-the-art NN-based spatially scalable image compression method, respectively, for various combinations of scale factors. Our COMPASS also shows comparable or even better coding efficiency than the single-layer coding for various scale factors.
Paper Structure (11 sections, 5 equations, 5 figures, 4 tables)

This paper contains 11 sections, 5 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: The COMPASS supports spatially scalable coding of $K$+1 arbitrary scaled versions of an image using a base layer (BL) and one or more enhancement layers (ELs). The EL-$k$ ($1 \leq k \leq K$) exploits a shared subnetwork that the Inter-layer Arbitraty Scale Prediction and Residual Compression modules. $I_0$ indicates the smallest-sized input image in the BL. $I_1, ..., I_K$ are the input images in the ELs in an increasing order of scale factors where $I_K$ is the largest-sized input image. Note that the scale factor between two adjacent layers can be any arbitrarily positive value.
  • Figure 2: Overall architecture of our COMPASS. It consists of a base layer (BL) depicted in the sky blue box and one or more enhancement layers (ELs) depicted in the light purple boxes which operate in an iterative manner. Note that we exploit the shared modules (LIFF and residual compression) for multiple ELs.
  • Figure 3: A predicted image via the LIFF module. (a) the reconstruction of the previous layer $k$-1, (b) the output (predicted image) of the LIFF module, (c) the input image of the current layer $k$, (d) the residual image as the input of the residual compression module.
  • Figure 4: The rate-PSNR performance curves of the final ELs for SHVC boyce2015overview, the simulcast coding, Mei et al.mei2021learning, the single-layer coding, and our COMPASS. The 'acc. bits' indicates the accumulated bits up to the final EL.
  • Figure 5: Visual comparison results for kodim23.png, kodim03.png, kodim17.png images in Kodak Lossless True Color Image dataset franzen1999kodak (best viewed in digital format). The 'acc. bits' indicates the accumulated bits up to the final EL. We match the accumulated bits among the compared methods as much as possible. Zoom for better visual comparison.