SC-CDM: Enhancing Quality of Image Semantic Communication with a Compact Diffusion Model
Kexin Zhang, Lixin Li, Wensheng Lin, Yuna Yan, Wenchi Cheng, Zhu Han
TL;DR
This work tackles bandwidth-efficient, perceptual-quality-aware wireless image transmission by marrying semantic communication with generative modeling. It deploys a swin Transformer-based semantic encoder at the transmitter and a compact diffusion-model decoder at the receiver, enabling high-fidelity reconstruction from compressed semantic content. The approach yields PSNR gains over CNN-based DeepJSCC and demonstrates robust performance across AWGN and Rayleigh channels, with a reported improvement in perceptual quality and graceful degradation under challenging channels. The key contribution is a compact diffusion module with a slim prior that guides image restoration, achieving high-quality semantic recovery while reducing computational load compared with traditional diffusion models.
Abstract
Semantic Communication (SC) is an emerging technology that has attracted much attention in the sixth-generation (6G) mobile communication systems. However, few literature has fully considered the perceptual quality of the reconstructed image. To solve this problem, we propose a generative SC for wireless image transmission (denoted as SC-CDM). This approach leverages compact diffusion models to improve the fidelity and semantic accuracy of the images reconstructed after transmission, ensuring that the essential content is preserved even in bandwidth-constrained environments. Specifically, we aim to redesign the swin Transformer as a new backbone for efficient semantic feature extraction and compression. Next, the receiver integrates the slim prior and image reconstruction networks. Compared to traditional Diffusion Models (DMs), it leverages DMs' robust distribution mapping capability to generate a compact condition vector, guiding image recovery, thus enhancing the perceptual details of the reconstructed images. Finally, a series of evaluation and ablation studies are conducted to validate the effectiveness and robustness of the proposed algorithm and further increase the Peak Signal-to-Noise Ratio (PSNR) by over 17% on top of CNN-based DeepJSCC.
