Dual-Representation Image Compression at Ultra-Low Bitrates via Explicit Semantics and Implicit Textures
Chuqin Zhou, Xiaoyue Ling, Yunuo Chen, Jincheng Dai, Guo Lu, Wenjun Zhang
TL;DR
This work addresses ultra-low bitrate image compression by unifying explicit semantic information with implicit diffusion-based textures in a training-free framework. By encoding compact explicit signals ($\\hat{y}, c$) and transmitting textured details through reverse-channel coding conditioned on those signals, the method achieves strong perceptual quality while preserving semantic fidelity. A plug-in distortion-perception module and tile-based processing provide fine-grained control and scalability to high-resolution inputs, enabling flexible rate–distortion–perception tradeoffs. Empirical results demonstrate state-of-the-art performance across Kodak, DIV2K, and CLIC2020, with substantial gains in DISTS, CLIPSim, and related metrics over prior diffusion- and explicit-prior methods, and strong compatibility with multiple base codecs.
Abstract
While recent neural codecs achieve strong performance at low bitrates when optimized for perceptual quality, their effectiveness deteriorates significantly under ultra-low bitrate conditions. To mitigate this, generative compression methods leveraging semantic priors from pretrained models have emerged as a promising paradigm. However, existing approaches are fundamentally constrained by a tradeoff between semantic faithfulness and perceptual realism. Methods based on explicit representations preserve content structure but often lack fine-grained textures, whereas implicit methods can synthesize visually plausible details at the cost of semantic drift. In this work, we propose a unified framework that bridges this gap by coherently integrating explicit and implicit representations in a training-free manner. Specifically, We condition a diffusion model on explicit high-level semantics while employing reverse-channel coding to implicitly convey fine-grained details. Moreover, we introduce a plug-in encoder that enables flexible control of the distortion-perception tradeoff by modulating the implicit information. Extensive experiments demonstrate that the proposed framework achieves state-of-the-art rate-perception performance, outperforming existing methods and surpassing DiffC by 29.92%, 19.33%, and 20.89% in DISTS BD-Rate on the Kodak, DIV2K, and CLIC2020 datasets, respectively.
