Table of Contents
Fetching ...

End-to-End Optimized Image Compression with the Frequency-Oriented Transform

Yuefeng Zhang, Kai Lin

TL;DR

This work proposes the end-to-end optimized image compression model facilitated by the frequency-oriented transform that outperforms all traditional codecs including next-generation standard H.266/VVC on MS-SSIM metric and enables scalable coding through the selective transmission of arbitrary frequency components.

Abstract

Image compression constitutes a significant challenge amidst the era of information explosion. Recent studies employing deep learning methods have demonstrated the superior performance of learning-based image compression methods over traditional codecs. However, an inherent challenge associated with these methods lies in their lack of interpretability. Following an analysis of the varying degrees of compression degradation across different frequency bands, we propose the end-to-end optimized image compression model facilitated by the frequency-oriented transform. The proposed end-to-end image compression model consists of four components: spatial sampling, frequency-oriented transform, entropy estimation, and frequency-aware fusion. The frequency-oriented transform separates the original image signal into distinct frequency bands, aligning with the human-interpretable concept. Leveraging the non-overlapping hypothesis, the model enables scalable coding through the selective transmission of arbitrary frequency components. Extensive experiments are conducted to demonstrate that our model outperforms all traditional codecs including next-generation standard H.266/VVC on MS-SSIM metric. Moreover, visual analysis tasks (i.e., object detection and semantic segmentation) are conducted to verify the proposed compression method could preserve semantic fidelity besides signal-level precision.

End-to-End Optimized Image Compression with the Frequency-Oriented Transform

TL;DR

This work proposes the end-to-end optimized image compression model facilitated by the frequency-oriented transform that outperforms all traditional codecs including next-generation standard H.266/VVC on MS-SSIM metric and enables scalable coding through the selective transmission of arbitrary frequency components.

Abstract

Image compression constitutes a significant challenge amidst the era of information explosion. Recent studies employing deep learning methods have demonstrated the superior performance of learning-based image compression methods over traditional codecs. However, an inherent challenge associated with these methods lies in their lack of interpretability. Following an analysis of the varying degrees of compression degradation across different frequency bands, we propose the end-to-end optimized image compression model facilitated by the frequency-oriented transform. The proposed end-to-end image compression model consists of four components: spatial sampling, frequency-oriented transform, entropy estimation, and frequency-aware fusion. The frequency-oriented transform separates the original image signal into distinct frequency bands, aligning with the human-interpretable concept. Leveraging the non-overlapping hypothesis, the model enables scalable coding through the selective transmission of arbitrary frequency components. Extensive experiments are conducted to demonstrate that our model outperforms all traditional codecs including next-generation standard H.266/VVC on MS-SSIM metric. Moreover, visual analysis tasks (i.e., object detection and semantic segmentation) are conducted to verify the proposed compression method could preserve semantic fidelity besides signal-level precision.
Paper Structure (37 sections, 8 equations, 10 figures, 5 tables)

This paper contains 37 sections, 8 equations, 10 figures, 5 tables.

Figures (10)

  • Figure 1: (a) Conceptual illustration of our proposed end-to-end optimized image compression model with the frequency-oriented transform that original image signal is tranformed into several frequency splits to further eliminate redundancy. (b) Power spectral density distribution chart through the Fourier transform which compares the degradation caused by different compression methods.
  • Figure 2: Comparison of the transform module used in end-to-end optimized image compression models. The unfilled rectangle represents an intermediate feature and the slashed rectangle means the feature used for entropy coding. The length of the rectangle represents its relative spatial size.
  • Figure 3: Overview architecture of our proposed compression model. Q is the quantization and SUM represents the pixel-wise sum. The frequency-oriented transform decomposes the input image into frequency non-overlapping components, i.e., $y_{low}, y_{mid}, y_{high}$. The frequency-aware fusion module is designed for frequency selection.
  • Figure 4: Illustration of frequency-aware fusion. Selective frequency components are aggregated together at the decoder side by SUM operation. $f$ represents the feature matrix.
  • Figure 5: Compression performance evaluation on Kodak dataset.
  • ...and 5 more figures