Table of Contents
Fetching ...

HDCompression: Hybrid-Diffusion Image Compression for Ultra-Low Bitrates

Lei Lu, Yize Li, Yanzhi Wang, Wei Wang, Wei Jiang

TL;DR

HDCompression tackles ultra-low bitrate image compression by fusing a conventional LIC stream with a generative VQ/diffusion stream in a dual-stream framework. It introduces dense representative vectors (DRVs) and lightweight diffusion modules to provide input-specific fidelity and to refine indices prediction without extra transmission overhead, coupled with a VQ latent correction path. The method employs two fusion modules—the DRV-based LIC enhancement and the VQ-correction module—to merge fidelity and perceptual priors effectively. Empirical results show HDCompression achieves balanced high perceptual quality (LPIPS) and fidelity (PSNR) at ultra-low bitrates, outperforming prior LIC, VQ-based, and hybrid methods on standard benchmarks.

Abstract

Image compression under ultra-low bitrates remains challenging for both conventional learned image compression (LIC) and generative vector-quantized (VQ) modeling. Conventional LIC suffers from severe artifacts due to heavy quantization, while generative VQ modeling gives poor fidelity due to the mismatch between learned generative priors and specific inputs. In this work, we propose Hybrid-Diffusion Image Compression (HDCompression), a dual-stream framework that utilizes both generative VQ-modeling and diffusion models, as well as conventional LIC, to achieve both high fidelity and high perceptual quality. Different from previous hybrid methods that directly use pre-trained LIC models to generate low-quality fidelity-preserving information from heavily quantized latent, we use diffusion models to extract high-quality complementary fidelity information from the ground-truth input, which can enhance the system performance in several aspects: improving index map prediction, enhancing the fidelity-preserving output of the LIC stream, and refining conditioned image reconstruction with VQ-latent correction. In addition, our diffusion model is based on a dense representative vector (DRV), which is lightweight with very simple sampling schedulers. Extensive experiments demonstrate that our HDCompression outperforms the previous conventional LIC, generative VQ-modeling, and hybrid frameworks in both quantitative metrics and qualitative visualization, providing balanced robust compression performance at ultra-low bitrates.

HDCompression: Hybrid-Diffusion Image Compression for Ultra-Low Bitrates

TL;DR

HDCompression tackles ultra-low bitrate image compression by fusing a conventional LIC stream with a generative VQ/diffusion stream in a dual-stream framework. It introduces dense representative vectors (DRVs) and lightweight diffusion modules to provide input-specific fidelity and to refine indices prediction without extra transmission overhead, coupled with a VQ latent correction path. The method employs two fusion modules—the DRV-based LIC enhancement and the VQ-correction module—to merge fidelity and perceptual priors effectively. Empirical results show HDCompression achieves balanced high perceptual quality (LPIPS) and fidelity (PSNR) at ultra-low bitrates, outperforming prior LIC, VQ-based, and hybrid methods on standard benchmarks.

Abstract

Image compression under ultra-low bitrates remains challenging for both conventional learned image compression (LIC) and generative vector-quantized (VQ) modeling. Conventional LIC suffers from severe artifacts due to heavy quantization, while generative VQ modeling gives poor fidelity due to the mismatch between learned generative priors and specific inputs. In this work, we propose Hybrid-Diffusion Image Compression (HDCompression), a dual-stream framework that utilizes both generative VQ-modeling and diffusion models, as well as conventional LIC, to achieve both high fidelity and high perceptual quality. Different from previous hybrid methods that directly use pre-trained LIC models to generate low-quality fidelity-preserving information from heavily quantized latent, we use diffusion models to extract high-quality complementary fidelity information from the ground-truth input, which can enhance the system performance in several aspects: improving index map prediction, enhancing the fidelity-preserving output of the LIC stream, and refining conditioned image reconstruction with VQ-latent correction. In addition, our diffusion model is based on a dense representative vector (DRV), which is lightweight with very simple sampling schedulers. Extensive experiments demonstrate that our HDCompression outperforms the previous conventional LIC, generative VQ-modeling, and hybrid frameworks in both quantitative metrics and qualitative visualization, providing balanced robust compression performance at ultra-low bitrates.

Paper Structure

This paper contains 24 sections, 10 equations, 7 figures.

Figures (7)

  • Figure 1: Visual comparisons of different methods. Bitrates are listed as percentages relative to our method. Traditional hand-crafted VVC and conventional LIC method MLIC present severe blurs, single-streamed VQ-codebook-based VQGAN generates inauthentic details, and HybridFlow has high-frequency artifacts. Our HDCompression retains both fidelity and clarity.
  • Figure 2: System Overview. We sample 2 Dense Representative Vectors (DRVs) by Denoising Networks (DNs) conditioned on the base LIC output $\hat{\textbf{x}}$. These DRVs serve as global guidance for enhancing fidelity and mask prediction. The enhanced LIC output $\hat{\textbf{x}}_{lic}$ further infuses fidelity information into the mask predictor and VQ Decoder in the generative stream.
  • Figure 3: DRV $\hat{\textbf{v}}_{joint_P}$ embedding process in $\textbf{T}$.
  • Figure 4: VQ Correction Module for dual-stream merging.
  • Figure 5: Quantitative metrics on Kodak, Tecnick and CLIC2020 test set. PSNR the higher the better. LPIPS the lower the better.
  • ...and 2 more figures