SING: Semantic Image Communications using Null-Space and INN-Guided Diffusion Models
Jiakang Chen, Selim F. Yilmaz, Di You, Pier Luigi Dragotti, Deniz Gündüz
TL;DR
SING reframes wireless image transmission with DeepJSCC as an inverse problem and enhances perceptual quality under harsh channel conditions by integrating diffusion priors. It introduces two stages: SING-Zero, a zero-shot method that linearly approximates the degradation and uses range-null-space guidance to restore images, and SING-INN, which models non-linear degradation with a conditional INN conditioned on channel SNR to refine diffusion-based reconstructions. The approach is encoder/decoder-agnostic and can operate without transmitter modification, demonstrating superior perceptual performance (LPIPS) and robust generalization to data distribution shifts, including extreme BCR/SNR and cross-dataset transfers. The experimental results on CelebA-HQ (and tests with TinyImageNet mismatch) show that SING improves over DeepJSCC and InverseJSCC, especially in perceptual quality, by effectively leveraging diffusion priors in an inverse-problem framework. Overall, SING offers a practical, diffusion-guided solution for semantic communications with strong robustness and minimal changes to the physical layer.
Abstract
Joint source-channel coding systems based on deep neural networks (DeepJSCC) have recently demonstrated remarkable performance in wireless image transmission. Existing methods primarily focus on minimizing distortion between the transmitted image and the reconstructed version at the receiver, often overlooking perceptual quality. This can lead to severe perceptual degradation when transmitting images under extreme conditions, such as low bandwidth compression ratios (BCRs) and low signal-to-noise ratios (SNRs). In this work, we propose SING, a novel two-stage JSCC framework that formulates the recovery of high-quality source images from corrupted reconstructions as an inverse problem. Depending on the availability of information about the DeepJSCC encoder/decoder and the channel at the receiver, SING can either approximate the stochastic degradation as a linear transformation, or leverage invertible neural networks (INNs) for precise modeling. Both approaches enable the seamless integration of diffusion models into the reconstruction process, enhancing perceptual quality. Experimental results demonstrate that SING outperforms DeepJSCC and other approaches, delivering superior perceptual quality even under extremely challenging conditions, including scenarios with significant distribution mismatches between the training and test data.
