Table of Contents
Fetching ...

SC-CDM: Enhancing Quality of Image Semantic Communication with a Compact Diffusion Model

Kexin Zhang, Lixin Li, Wensheng Lin, Yuna Yan, Wenchi Cheng, Zhu Han

TL;DR

This work tackles bandwidth-efficient, perceptual-quality-aware wireless image transmission by marrying semantic communication with generative modeling. It deploys a swin Transformer-based semantic encoder at the transmitter and a compact diffusion-model decoder at the receiver, enabling high-fidelity reconstruction from compressed semantic content. The approach yields PSNR gains over CNN-based DeepJSCC and demonstrates robust performance across AWGN and Rayleigh channels, with a reported improvement in perceptual quality and graceful degradation under challenging channels. The key contribution is a compact diffusion module with a slim prior that guides image restoration, achieving high-quality semantic recovery while reducing computational load compared with traditional diffusion models.

Abstract

Semantic Communication (SC) is an emerging technology that has attracted much attention in the sixth-generation (6G) mobile communication systems. However, few literature has fully considered the perceptual quality of the reconstructed image. To solve this problem, we propose a generative SC for wireless image transmission (denoted as SC-CDM). This approach leverages compact diffusion models to improve the fidelity and semantic accuracy of the images reconstructed after transmission, ensuring that the essential content is preserved even in bandwidth-constrained environments. Specifically, we aim to redesign the swin Transformer as a new backbone for efficient semantic feature extraction and compression. Next, the receiver integrates the slim prior and image reconstruction networks. Compared to traditional Diffusion Models (DMs), it leverages DMs' robust distribution mapping capability to generate a compact condition vector, guiding image recovery, thus enhancing the perceptual details of the reconstructed images. Finally, a series of evaluation and ablation studies are conducted to validate the effectiveness and robustness of the proposed algorithm and further increase the Peak Signal-to-Noise Ratio (PSNR) by over 17% on top of CNN-based DeepJSCC.

SC-CDM: Enhancing Quality of Image Semantic Communication with a Compact Diffusion Model

TL;DR

This work tackles bandwidth-efficient, perceptual-quality-aware wireless image transmission by marrying semantic communication with generative modeling. It deploys a swin Transformer-based semantic encoder at the transmitter and a compact diffusion-model decoder at the receiver, enabling high-fidelity reconstruction from compressed semantic content. The approach yields PSNR gains over CNN-based DeepJSCC and demonstrates robust performance across AWGN and Rayleigh channels, with a reported improvement in perceptual quality and graceful degradation under challenging channels. The key contribution is a compact diffusion module with a slim prior that guides image restoration, achieving high-quality semantic recovery while reducing computational load compared with traditional diffusion models.

Abstract

Semantic Communication (SC) is an emerging technology that has attracted much attention in the sixth-generation (6G) mobile communication systems. However, few literature has fully considered the perceptual quality of the reconstructed image. To solve this problem, we propose a generative SC for wireless image transmission (denoted as SC-CDM). This approach leverages compact diffusion models to improve the fidelity and semantic accuracy of the images reconstructed after transmission, ensuring that the essential content is preserved even in bandwidth-constrained environments. Specifically, we aim to redesign the swin Transformer as a new backbone for efficient semantic feature extraction and compression. Next, the receiver integrates the slim prior and image reconstruction networks. Compared to traditional Diffusion Models (DMs), it leverages DMs' robust distribution mapping capability to generate a compact condition vector, guiding image recovery, thus enhancing the perceptual details of the reconstructed images. Finally, a series of evaluation and ablation studies are conducted to validate the effectiveness and robustness of the proposed algorithm and further increase the Peak Signal-to-Noise Ratio (PSNR) by over 17% on top of CNN-based DeepJSCC.
Paper Structure (11 sections, 3 equations, 5 figures)

This paper contains 11 sections, 3 equations, 5 figures.

Figures (5)

  • Figure 1: The overall architecture of the proposed SC-CDM system.
  • Figure 2: The pipeline of the semantic fine-tuning module pre-training, which consists of ${\bm{N}_1}$ and ${\bm{N}_2}$.
  • Figure 3: PSNR performance versus the SNR. (a) AWGN channel. (b) Rayleigh channel.
  • Figure 4: The ablation performance versus the SNR. (a) PSNR. (b) SSIM.
  • Figure 5: Visual comparison examples under two types of channels at SNR=15dB.