Table of Contents
Fetching ...

Semantic-Preserving Image Coding based on Conditional Diffusion Models

Francesco Pezone, Osman Musa, Giuseppe Caire, Sergio Barbarossa

TL;DR

SPIC addresses semantic image coding by prioritizing semantic content over exact pixel fidelity. It encodes a lossless Semantic Segmentation Map (SSM) and a downscaled image, and uses a semantically-conditioned diffusion model to reconstruct a high-resolution image from both inputs. The modular pipeline leverages off-the-shelf components (e.g., INTERN-2.5 for segmentation, FLIF/BPG for compression) and a dual-conditioned diffusion decoder to preserve semantically important objects at favorable rate-distortion. Experimental results on Cityscapes demonstrate improved semantic retention (higher mIoU) and perceptual quality (lower FID) compared with traditional codecs and SR approaches, validating the method’s potential for semantic communications and efficient image transmission.

Abstract

Semantic communication, rather than on a bit-by-bit recovery of the transmitted messages, focuses on the meaning and the goal of the communication itself. In this paper, we propose a novel semantic image coding scheme that preserves the semantic content of an image, while ensuring a good trade-off between coding rate and image quality. The proposed Semantic-Preserving Image Coding based on Conditional Diffusion Models (SPIC) transmitter encodes a Semantic Segmentation Map (SSM) and a low-resolution version of the image to be transmitted. The receiver then reconstructs a high-resolution image using a Denoising Diffusion Probabilistic Models (DDPM) doubly conditioned to the SSM and the low-resolution image. As shown by the numerical examples, compared to state-of-the-art (SOTA) approaches, the proposed SPIC exhibits a better balance between the conventional rate-distortion trade-off and the preservation of semantically-relevant features. Code available at https://github.com/frapez1/SPIC

Semantic-Preserving Image Coding based on Conditional Diffusion Models

TL;DR

SPIC addresses semantic image coding by prioritizing semantic content over exact pixel fidelity. It encodes a lossless Semantic Segmentation Map (SSM) and a downscaled image, and uses a semantically-conditioned diffusion model to reconstruct a high-resolution image from both inputs. The modular pipeline leverages off-the-shelf components (e.g., INTERN-2.5 for segmentation, FLIF/BPG for compression) and a dual-conditioned diffusion decoder to preserve semantically important objects at favorable rate-distortion. Experimental results on Cityscapes demonstrate improved semantic retention (higher mIoU) and perceptual quality (lower FID) compared with traditional codecs and SR approaches, validating the method’s potential for semantic communications and efficient image transmission.

Abstract

Semantic communication, rather than on a bit-by-bit recovery of the transmitted messages, focuses on the meaning and the goal of the communication itself. In this paper, we propose a novel semantic image coding scheme that preserves the semantic content of an image, while ensuring a good trade-off between coding rate and image quality. The proposed Semantic-Preserving Image Coding based on Conditional Diffusion Models (SPIC) transmitter encodes a Semantic Segmentation Map (SSM) and a low-resolution version of the image to be transmitted. The receiver then reconstructs a high-resolution image using a Denoising Diffusion Probabilistic Models (DDPM) doubly conditioned to the SSM and the low-resolution image. As shown by the numerical examples, compared to state-of-the-art (SOTA) approaches, the proposed SPIC exhibits a better balance between the conventional rate-distortion trade-off and the preservation of semantically-relevant features. Code available at https://github.com/frapez1/SPIC
Paper Structure (6 sections, 2 equations, 4 figures)

This paper contains 6 sections, 2 equations, 4 figures.

Figures (4)

  • Figure 1: Overview of the SPIC Architecture. The diagram illustrates our novel approach, combining a Semantic Segmentation Map ($\mathbf{s}$) and a coarse low-resolution image ($\mathbf{c}$), both compressed with classical out-of-the-shelf algorithms for efficient encoding. The reconstruction employs the proposed Semantic-Conditioned Super-Resolution Diffusion Model, leveraging both $s$ and $c$ for high-fidelity semantic-relevant image recovery even at low BPP.
  • Figure 2: (a) Resulting image compressed with the BPG algorithm at 0.176 BPP. (b) Reconstructed image employing our approach at 0.166 BPP. (c) Detail of the image compressed with the BPG. (d) Detail of the image reconstructed with our approach.
  • Figure 3: Detail comparison between (top) the image reconstructed with the SOTA SR model Super_res_CVPR and (bottom) the image reconstructed with our model
  • Figure 4: Comparison in terms of mIoU (a) and FID (b) vs. BPP.