Table of Contents
Fetching ...

Latent Diffusion Model-Enabled Low-Latency Semantic Communication in the Presence of Semantic Ambiguities and Wireless Channel Noises

Jianhua Pei, Cheng Feng, Ping Wang, Hina Tabassum, Dongyuan Shi

TL;DR

Extensive numerical experiments demonstrate the superiority of the proposed SemCom system, consistently proving its robustness to outliers, the capability to transmit data with unknown distributions, and the ability to perform real-time channel denoising tasks while preserving high human perceptual quality.

Abstract

Deep learning (DL)-based Semantic Communications (SemCom) is becoming critical to maximize overall efficiency of communication networks. Nevertheless, SemCom is sensitive to wireless channel uncertainties, source outliers, and suffer from poor generalization bottlenecks. To address the mentioned challenges, this paper develops a latent diffusion model-enabled SemCom system with three key contributions, i.e., i) to handle potential outliers in the source data, semantic errors obtained by projected gradient descent based on the vulnerabilities of DL models, are utilized to update the parameters and obtain an outlier-robust encoder, ii) a lightweight single-layer latent space transformation adapter completes one-shot learning at the transmitter and is placed before the decoder at the receiver, enabling adaptation for out-of-distribution data and enhancing human-perceptual quality, and iii) an end-to-end consistency distillation (EECD) strategy is used to distill the diffusion models trained in latent space, enabling deterministic single or few-step low-latency denoising in various noisy channels while maintaining high semantic quality. Extensive numerical experiments across different datasets demonstrate the superiority of the proposed SemCom system, consistently proving its robustness to outliers, the capability to transmit data with unknown distributions, and the ability to perform real-time channel denoising tasks while preserving high human perceptual quality, outperforming the existing denoising approaches in semantic metrics such as multi-scale structural similarity index measure (MS-SSIM) and learned perceptual image path similarity (LPIPS).

Latent Diffusion Model-Enabled Low-Latency Semantic Communication in the Presence of Semantic Ambiguities and Wireless Channel Noises

TL;DR

Extensive numerical experiments demonstrate the superiority of the proposed SemCom system, consistently proving its robustness to outliers, the capability to transmit data with unknown distributions, and the ability to perform real-time channel denoising tasks while preserving high human perceptual quality.

Abstract

Deep learning (DL)-based Semantic Communications (SemCom) is becoming critical to maximize overall efficiency of communication networks. Nevertheless, SemCom is sensitive to wireless channel uncertainties, source outliers, and suffer from poor generalization bottlenecks. To address the mentioned challenges, this paper develops a latent diffusion model-enabled SemCom system with three key contributions, i.e., i) to handle potential outliers in the source data, semantic errors obtained by projected gradient descent based on the vulnerabilities of DL models, are utilized to update the parameters and obtain an outlier-robust encoder, ii) a lightweight single-layer latent space transformation adapter completes one-shot learning at the transmitter and is placed before the decoder at the receiver, enabling adaptation for out-of-distribution data and enhancing human-perceptual quality, and iii) an end-to-end consistency distillation (EECD) strategy is used to distill the diffusion models trained in latent space, enabling deterministic single or few-step low-latency denoising in various noisy channels while maintaining high semantic quality. Extensive numerical experiments across different datasets demonstrate the superiority of the proposed SemCom system, consistently proving its robustness to outliers, the capability to transmit data with unknown distributions, and the ability to perform real-time channel denoising tasks while preserving high human perceptual quality, outperforming the existing denoising approaches in semantic metrics such as multi-scale structural similarity index measure (MS-SSIM) and learned perceptual image path similarity (LPIPS).
Paper Structure (23 sections, 2 theorems, 32 equations, 12 figures, 3 tables, 5 algorithms)

This paper contains 23 sections, 2 theorems, 32 equations, 12 figures, 3 tables, 5 algorithms.

Key Result

Proposition 1

Ignoring the channel's cross entropy term of the latent space and taking into account the receiver reconstruction term and the transmitter encoding entropy, the VUB defined in Eq. JSCC_loss can be transformed into where the proof can be seen in Appendix transVAEWGAN.

Figures (12)

  • Figure 1: The proposed SemCom system with three addressed DL-based communication challenges: ① robust GAN inversion with semantic errors, ② domain adaptation with unknown distribution, and ③ real-time wireless channel denoising with EECD, where $\bm{\mu}$ and $\bm{\sigma}$ are the two components of latent bottleneck of VAE, $\bm{H}_z$, $\bm{H}_n$, $\sigma^2$, $\bm{z}_R$, and $\bm{y}_R$ are the CSIs, real-valued transmitted encodings, and equalized received signals, respectively, as defined in Section \ref{['sec:LCDM']}. $f_e(\cdot)$ represents the modulation encoding for 256-QAM, while $f_d(\cdot)$ represents the demodulation decoding for 256-QAM. Other symbols' definition can be found in Section \ref{['sec:System']}.
  • Figure 2: Self-supervised robust encoder optimization with semantic error $\bm{\delta}$.
  • Figure 3: Out-of-domain latent space determination using lightweight single-layer network and adversarial training method.
  • Figure 4: In the proposed SemCom model, data is mapped into latent space via robust encoder $q_{\bm{\phi}'}(\bm{z}_0|\bm{x})$. Then, EECD maps noisy received signals to denoised latent vector ($\bm{z}_{t_m} \rightarrow \bm{z}_{\varepsilon }$) and decoder will generate data with desired semantic meaning by $p_{\bm{\psi}}(\bm{x}|\bm{z}_{\varepsilon })$.
  • Figure 5: Some typical decoded images without/with robust encoder under AWGN and Rayleigh channel. The SNR is 20dB.
  • ...and 7 more figures

Theorems & Definitions (2)

  • Proposition 1
  • Proposition 2