Table of Contents
Fetching ...

Wavelet-Like Transform-Based Technology in Response to the Call for Proposals on Neural Network-Based Image Coding

Cunhui Dong, Haichuan Ma, Haotian Zhang, Changsheng Gao, Li Li, Dong Liu

TL;DR

iWaveV3 is introduced, a novel wavelet-like transform-based end-to-end image coding framework that achieves state-of-the-art compression efficiency for objective quality and is very competitive for perceptual quality and is adopted as a candidate scheme for developing the IEEE Standard for neural-network-based image coding.

Abstract

Neural network-based image coding has been developing rapidly since its birth. Until 2022, its performance has surpassed that of the best-performing traditional image coding framework -- H.266/VVC. Witnessing such success, the IEEE 1857.11 working subgroup initializes a neural network-based image coding standard project and issues a corresponding call for proposals (CfP). In response to the CfP, this paper introduces a novel wavelet-like transform-based end-to-end image coding framework -- iWaveV3. iWaveV3 incorporates many new features such as affine wavelet-like transform, perceptual-friendly quality metric, and more advanced training and online optimization strategies into our previous wavelet-like transform-based framework iWave++. While preserving the features of supporting lossy and lossless compression simultaneously, iWaveV3 also achieves state-of-the-art compression efficiency for objective quality and is very competitive for perceptual quality. As a result, iWaveV3 is adopted as a candidate scheme for developing the IEEE Standard for neural-network-based image coding.

Wavelet-Like Transform-Based Technology in Response to the Call for Proposals on Neural Network-Based Image Coding

TL;DR

iWaveV3 is introduced, a novel wavelet-like transform-based end-to-end image coding framework that achieves state-of-the-art compression efficiency for objective quality and is very competitive for perceptual quality and is adopted as a candidate scheme for developing the IEEE Standard for neural-network-based image coding.

Abstract

Neural network-based image coding has been developing rapidly since its birth. Until 2022, its performance has surpassed that of the best-performing traditional image coding framework -- H.266/VVC. Witnessing such success, the IEEE 1857.11 working subgroup initializes a neural network-based image coding standard project and issues a corresponding call for proposals (CfP). In response to the CfP, this paper introduces a novel wavelet-like transform-based end-to-end image coding framework -- iWaveV3. iWaveV3 incorporates many new features such as affine wavelet-like transform, perceptual-friendly quality metric, and more advanced training and online optimization strategies into our previous wavelet-like transform-based framework iWave++. While preserving the features of supporting lossy and lossless compression simultaneously, iWaveV3 also achieves state-of-the-art compression efficiency for objective quality and is very competitive for perceptual quality. As a result, iWaveV3 is adopted as a candidate scheme for developing the IEEE Standard for neural-network-based image coding.
Paper Structure (30 sections, 13 equations, 16 figures, 5 tables)

This paper contains 30 sections, 13 equations, 16 figures, 5 tables.

Figures (16)

  • Figure 1: Overview of the proposed iWaveV3. It mainly consists of four modules. The transform module offers three options, namely, the additive transform, affine transform, and CDF 5/3 wavelet transform, with the affine and additive transform implemented using CNN. The quantization module initially divides the subbands by the QStep and then quantizes them to integers using rounding. The entropy coding module is used to code quantized coefficients into bitstream through an autoregressive context model. The post-processing module is used to alleviate the quantization distortion and improve the qualities of reconstructed images.
  • Figure 2: The structure of additive wavelet-like transform and affine wavelet-like transform. $S$ stands for split, $P_i$ and $U_i$ stand for the i-th prediction unit and the i-th update unit, respectively. They are both constructed by CNN. $N$ is the number of lifting steps.
  • Figure 3: The $P_i$ or $U_i$ structure of additive wavelet-like transform and affine wavelet-like transform. The numbers like 3$\times$3$\times$16 indicate the kernel size (3$\times$3) and the number of channels (16). ReLU indicates the adopted nonlinear activation function.
  • Figure 4: The pipeline of forward transform for two-dimensional images. The $P_i$ and $U_i$ can use that of additive wavelet-like transform, affine wavelet-like transform, or CDF 5/3 wavelet transform.
  • Figure 5: (a) The subbands are obtained by 3 level wavelet transform. The order of coding the subbands is denoted by the red line. (b) For each subband, the wavelet coefficients are coded one by one according to the red broken line.
  • ...and 11 more figures