Semantic-Preserving Image Coding based on Conditional Diffusion Models
Francesco Pezone, Osman Musa, Giuseppe Caire, Sergio Barbarossa
TL;DR
SPIC addresses semantic image coding by prioritizing semantic content over exact pixel fidelity. It encodes a lossless Semantic Segmentation Map (SSM) and a downscaled image, and uses a semantically-conditioned diffusion model to reconstruct a high-resolution image from both inputs. The modular pipeline leverages off-the-shelf components (e.g., INTERN-2.5 for segmentation, FLIF/BPG for compression) and a dual-conditioned diffusion decoder to preserve semantically important objects at favorable rate-distortion. Experimental results on Cityscapes demonstrate improved semantic retention (higher mIoU) and perceptual quality (lower FID) compared with traditional codecs and SR approaches, validating the method’s potential for semantic communications and efficient image transmission.
Abstract
Semantic communication, rather than on a bit-by-bit recovery of the transmitted messages, focuses on the meaning and the goal of the communication itself. In this paper, we propose a novel semantic image coding scheme that preserves the semantic content of an image, while ensuring a good trade-off between coding rate and image quality. The proposed Semantic-Preserving Image Coding based on Conditional Diffusion Models (SPIC) transmitter encodes a Semantic Segmentation Map (SSM) and a low-resolution version of the image to be transmitted. The receiver then reconstructs a high-resolution image using a Denoising Diffusion Probabilistic Models (DDPM) doubly conditioned to the SSM and the low-resolution image. As shown by the numerical examples, compared to state-of-the-art (SOTA) approaches, the proposed SPIC exhibits a better balance between the conventional rate-distortion trade-off and the preservation of semantically-relevant features. Code available at https://github.com/frapez1/SPIC
