Dual-Representation Image Compression at Ultra-Low Bitrates via Explicit Semantics and Implicit Textures

Chuqin Zhou; Xiaoyue Ling; Yunuo Chen; Jincheng Dai; Guo Lu; Wenjun Zhang

Dual-Representation Image Compression at Ultra-Low Bitrates via Explicit Semantics and Implicit Textures

Chuqin Zhou, Xiaoyue Ling, Yunuo Chen, Jincheng Dai, Guo Lu, Wenjun Zhang

TL;DR

This work addresses ultra-low bitrate image compression by unifying explicit semantic information with implicit diffusion-based textures in a training-free framework. By encoding compact explicit signals ($\\hat{y}, c$) and transmitting textured details through reverse-channel coding conditioned on those signals, the method achieves strong perceptual quality while preserving semantic fidelity. A plug-in distortion-perception module and tile-based processing provide fine-grained control and scalability to high-resolution inputs, enabling flexible rate–distortion–perception tradeoffs. Empirical results demonstrate state-of-the-art performance across Kodak, DIV2K, and CLIC2020, with substantial gains in DISTS, CLIPSim, and related metrics over prior diffusion- and explicit-prior methods, and strong compatibility with multiple base codecs.

Abstract

While recent neural codecs achieve strong performance at low bitrates when optimized for perceptual quality, their effectiveness deteriorates significantly under ultra-low bitrate conditions. To mitigate this, generative compression methods leveraging semantic priors from pretrained models have emerged as a promising paradigm. However, existing approaches are fundamentally constrained by a tradeoff between semantic faithfulness and perceptual realism. Methods based on explicit representations preserve content structure but often lack fine-grained textures, whereas implicit methods can synthesize visually plausible details at the cost of semantic drift. In this work, we propose a unified framework that bridges this gap by coherently integrating explicit and implicit representations in a training-free manner. Specifically, We condition a diffusion model on explicit high-level semantics while employing reverse-channel coding to implicitly convey fine-grained details. Moreover, we introduce a plug-in encoder that enables flexible control of the distortion-perception tradeoff by modulating the implicit information. Extensive experiments demonstrate that the proposed framework achieves state-of-the-art rate-perception performance, outperforming existing methods and surpassing DiffC by 29.92%, 19.33%, and 20.89% in DISTS BD-Rate on the Kodak, DIV2K, and CLIC2020 datasets, respectively.

Dual-Representation Image Compression at Ultra-Low Bitrates via Explicit Semantics and Implicit Textures

TL;DR

) and transmitting textured details through reverse-channel coding conditioned on those signals, the method achieves strong perceptual quality while preserving semantic fidelity. A plug-in distortion-perception module and tile-based processing provide fine-grained control and scalability to high-resolution inputs, enabling flexible rate–distortion–perception tradeoffs. Empirical results demonstrate state-of-the-art performance across Kodak, DIV2K, and CLIC2020, with substantial gains in DISTS, CLIPSim, and related metrics over prior diffusion- and explicit-prior methods, and strong compatibility with multiple base codecs.

Abstract

Paper Structure (30 sections, 2 equations, 14 figures, 5 tables, 4 algorithms)

This paper contains 30 sections, 2 equations, 14 figures, 5 tables, 4 algorithms.

Introduction
Related Work
Perceptual Image Compression
Ultra-Low Bitrate Image Compression
Preliminary
Diffusion Denoising Probabilistic Models
Reverse-Channel Coding
Proposed Method
Explicit Semantic Information
Implicit Textural Information
Decoding with Conditional Diffusion
Distortion-Perception Tradeoff
Tile-based Processing
Experiments
Settings
...and 15 more sections

Figures (14)

Figure 1: Comparison of different methods at ultra-low bitrates. Blue denotes explicit compression methods, green denotes implicit ones, and red represents our dual representations approach. Our method better preserves textural and semantic details.
Figure 2: Visual examples and comparisons on 2K-resolution image at ultra-low bitrates. Our method reconstructs more realistic and consistent details with fewer bits. In contrast, PerCo Careil_2024_ICLR_Perco, DiffEIC Li_25_TCSVT_DiffEIC and DiffC Vonderfecht_25_ICLR_Diffc exhibit inconsistent details compared to the original images. Best viewed on screen for details.
Figure 3: Overview of the proposed dual branch compression framework. Explicit semantics consist of the quantized latent $\hat{y}$ and a tag-style text prompt $c$, while implicit textures are derived from noise-corrupted latents using RCC.
Figure 4: Rate–metric curves on Kodak, DIV2K, and CLIC2020 datasets. Arrows indicate whether higher ($\uparrow$) or lower ($\downarrow$) values are better. See supplementary for more results.
Figure 5: Rate–metric curves on the Kodak dataset. Our method is applied to multiple versions of each base model, trained at different bitrates, resulting in several curves per model. Our method consistently outperforms the corresponding baselines across all bitrate settings.
...and 9 more figures

Dual-Representation Image Compression at Ultra-Low Bitrates via Explicit Semantics and Implicit Textures

TL;DR

Abstract

Dual-Representation Image Compression at Ultra-Low Bitrates via Explicit Semantics and Implicit Textures

Authors

TL;DR

Abstract

Table of Contents

Figures (14)