Table of Contents
Fetching ...

Deeply-Conditioned Image Compression via Self-Generated Priors

Zhineng Zhao, Zhihai He, Zikun Zhou, Siwei Ma, Yaowei Wang

TL;DR

The paper tackles geometric deformation and entanglement in learned image compression by introducing DCIC-sgp, which first learns a potent self-generated structure prior and then deeply conditions the entire compression pipeline. This explicit functional decomposition decouples the stable structural backbone from transient textures, enabling the analysis transform to focus on residual details and providing global context to the entropy model. Empirical results show substantial BD-rate savings against VTM-12.1 (up to around 15%), improved structure preservation at low bitrates, and strong generalization to medical image domains, demonstrating practical impact for high-fidelity, low-rate compression. Overall, the work presents a principled, deeply conditioned, internally guided compression framework that achieves state-of-the-art efficiency without prohibitive computation, paving the way for extensions to video and 3D data.

Abstract

Learned image compression (LIC) has shown great promise for achieving high rate-distortion performance. However, current LIC methods are often limited in their capability to model the complex correlation structures inherent in natural images, particularly the entanglement of invariant global structures with transient local textures within a single monolithic representation. This limitation precipitates severe geometric deformation at low bitrates. To address this, we introduce a framework predicated on functional decomposition, which we term Deeply-Conditioned Image Compression via self-generated priors (DCIC-sgp). Our central idea is to first encode a potent, self-generated prior to encapsulate the image's structural backbone. This prior is subsequently utilized not as mere side-information, but to holistically modulate the entire compression pipeline. This deep conditioning, most critically of the analysis transform, liberates it to dedicate its representational capacity to the residual, high-entropy details. This hierarchical, dependency-driven approach achieves an effective disentanglement of information streams. Our extensive experiments validate this assertion; visual analysis demonstrates that our method substantially mitigates the geometric deformation artifacts that plague conventional codecs at low bitrates. Quantitatively, our framework establishes highly competitive performance, achieving significant BD-rate reductions of 14.4%, 15.7%, and 15.1% against the VVC test model VTM-12.1 on the Kodak, CLIC, and Tecnick datasets.

Deeply-Conditioned Image Compression via Self-Generated Priors

TL;DR

The paper tackles geometric deformation and entanglement in learned image compression by introducing DCIC-sgp, which first learns a potent self-generated structure prior and then deeply conditions the entire compression pipeline. This explicit functional decomposition decouples the stable structural backbone from transient textures, enabling the analysis transform to focus on residual details and providing global context to the entropy model. Empirical results show substantial BD-rate savings against VTM-12.1 (up to around 15%), improved structure preservation at low bitrates, and strong generalization to medical image domains, demonstrating practical impact for high-fidelity, low-rate compression. Overall, the work presents a principled, deeply conditioned, internally guided compression framework that achieves state-of-the-art efficiency without prohibitive computation, paving the way for extensions to video and 3D data.

Abstract

Learned image compression (LIC) has shown great promise for achieving high rate-distortion performance. However, current LIC methods are often limited in their capability to model the complex correlation structures inherent in natural images, particularly the entanglement of invariant global structures with transient local textures within a single monolithic representation. This limitation precipitates severe geometric deformation at low bitrates. To address this, we introduce a framework predicated on functional decomposition, which we term Deeply-Conditioned Image Compression via self-generated priors (DCIC-sgp). Our central idea is to first encode a potent, self-generated prior to encapsulate the image's structural backbone. This prior is subsequently utilized not as mere side-information, but to holistically modulate the entire compression pipeline. This deep conditioning, most critically of the analysis transform, liberates it to dedicate its representational capacity to the residual, high-entropy details. This hierarchical, dependency-driven approach achieves an effective disentanglement of information streams. Our extensive experiments validate this assertion; visual analysis demonstrates that our method substantially mitigates the geometric deformation artifacts that plague conventional codecs at low bitrates. Quantitatively, our framework establishes highly competitive performance, achieving significant BD-rate reductions of 14.4%, 15.7%, and 15.1% against the VVC test model VTM-12.1 on the Kodak, CLIC, and Tecnick datasets.

Paper Structure

This paper contains 32 sections, 6 equations, 9 figures, 4 tables.

Figures (9)

  • Figure 1: A conceptual comparison of compression paradigms. (a) The standard framework, which relies on a single, entangled latent representation. (b) Our proposed DCIC-sgp framework, which instantiates the principle of functional decomposition through a causally-dependent hierarchical architecture. In this paradigm, a Prior Extractor ($E_s$) generates a structure prior ($s$), which then holistically guides the entire pipeline by modulating the analysis transform ($g_a$), assisting the entropy model ($P$), and steering the synthesis transform ($g_s$).
  • Figure 2: The overall framework of the proposed Deeply-Conditioned Image Compression with self-generated priors (DCIC-sgp) method. A Prior Extractor ($E_s$) maps the original image to a structure prior ($s$). The decoded prior ($\hat{s}$) is then used to guide both the Conditioned Analysis Transform ($g_a$) and the Entropy Model. The Conditioned Synthesis Transform ($g_s$) combines both representations ($\hat{s}$ and $\hat{y}$) for final reconstruction.
  • Figure 3: Our unified Entropy Model. It features two distinct processing paths: one for the structure prior $s$ (top), which employs a standard hyperprior mechanism, and another for the detail representation $y$ (bottom), which uses a conditional mechanism. To be precise with the notation used in the figure, $\bar{s}$ and $\bar{y}$ represent the quantized latents before entropy coding, while $\hat{s}$ and $\hat{y}$ represent the latents after entropy decoding. Due to lossless entropy coding, they are numerically identical (e.g., $\bar{s} = \hat{s}$). The parameter networks $P_m$ and $P_s$ fuse information from the hyper-decoder with the decoded structure prior $\hat{s}$ to generate the final distribution parameters for $y$.
  • Figure 4: Rate-distortion performance of our DCIC-sgp models (DCIC-sgp-MSH and DCIC-sgp-TCM) compared against their respective baselines and other leading methods across various datasets and metrics.
  • Figure 5: Visual comparison of reconstructed images by our method against baseline methods. The close-ups highlight the superior ability of our DCIC-sgp framework to mitigate geometric deformation.
  • ...and 4 more figures