Latency-Aware Generative Semantic Communications with Pre-Trained Diffusion Models

Li Qiao; Mahdi Boloursaz Mashhadi; Zhen Gao; Chuan Heng Foh; Pei Xiao; Mehdi Bennis

Latency-Aware Generative Semantic Communications with Pre-Trained Diffusion Models

Li Qiao, Mahdi Boloursaz Mashhadi, Zhen Gao, Chuan Heng Foh, Pei Xiao, Mehdi Bennis

TL;DR

The paper addresses ultra-low-rate semantic communications by leveraging pre-trained generative foundation models to synthesize signals at the receiver from compressed semantic streams. It proposes a latency-aware GenSemCom framework with multi-modal semantic decomposition, a re-transmission based prompt for reliability, adaptive modulation and coding for conditioning signals, and a latency-aware power allocation under semantic quality constraints. At the receiver, a pre-trained diffusion model generates high fidelity outputs guided by the prompt and conditioning signals, enabling universal applicability without shared knowledge bases. Simulations on image data show ultra-low-rate, low-latency, and channel-adaptive performance, and quantify the trade-offs between latency, power, and semantic quality metrics such as CLIP and MS-SSIM.

Abstract

Generative foundation AI models have recently shown great success in synthesizing natural signals with high perceptual quality using only textual prompts and conditioning signals to guide the generation process. This enables semantic communications at extremely low data rates in future wireless networks. In this paper, we develop a latency-aware semantic communications framework with pre-trained generative models. The transmitter performs multi-modal semantic decomposition on the input signal and transmits each semantic stream with the appropriate coding and communication schemes based on the intent. For the prompt, we adopt a re-transmission-based scheme to ensure reliable transmission, and for the other semantic modalities we use an adaptive modulation/coding scheme to achieve robustness to the changing wireless channel. Furthermore, we design a semantic and latency-aware scheme to allocate transmission power to different semantic modalities based on their importance subjected to semantic quality constraints. At the receiver, a pre-trained generative model synthesizes a high fidelity signal using the received multi-stream semantics. Simulation results demonstrate ultra-low-rate, low-latency, and channel-adaptive semantic communications.

Latency-Aware Generative Semantic Communications with Pre-Trained Diffusion Models

TL;DR

Abstract

Paper Structure (15 sections, 1 theorem, 5 equations, 4 figures, 2 tables)

This paper contains 15 sections, 1 theorem, 5 equations, 4 figures, 2 tables.

Introduction
Generative Foundation AI-based Semantic Communication
Multi-modal Semantic Decomposition and Synthesis
Semantic-aware Multi-Stream Transmission
Re-transmission-based Communication of the Textual Prompt
Adaptive Modulation and Coding for other Semantic Modalities
Latency-aware Adaptive Semantic Communication
Simulation Results
System setup
Semantic quality metrics
Monotonicity of the quality metrics
Visual quality of the proposed framework
Latency-aware Adaptive Semantic Communication
Computation latency
Conclusions

Key Result

Lemma 1

For the case of one conditioning signal, i.e., $I=1$, if $\Phi_j(\text{BER}_1)$, $\forall j\in[J]$, are monotonically non-increasing with respect to $\text{BER}_1$, (OptiProb) is a convex problem. The optimal solution can be achieved if and only if $p_0+ p_{1}= P_{\rm{T}}$, $T_0(p_0)=T_1(p_1, \text{

Figures (4)

Figure 1: The Proposed Framework for Latency-aware Multi-stream Semantic Communication with Multi-Modal Generative Models.
Figure 2: Normalized CLIP and MS-SSIM versus BER of the edge map: (a) Prompt generated by GPT-4; (b) Prompt generated by BLIP.
Figure 3: Visualization of the semantic quality of our proposed framework. The absolute CLIP/MS-SSIM values are reported in this figure for comparison.
Figure 4: Optimal wireless parameters versus average SNR $\overline{\gamma}$ at various target BERs: (a) Percentage of power for prompt; (b) Transmission Latency; (c) Modulation order (bits per symbol) for the edge map; (d) Average numbers of prompt re-transmissions. The (CLIP, MS-SSIM) notation in the legend represents the normalized CLIP/MS-SSIM values achieved for each curve.

Theorems & Definitions (2)

Lemma 1
proof

Latency-Aware Generative Semantic Communications with Pre-Trained Diffusion Models

TL;DR

Abstract

Latency-Aware Generative Semantic Communications with Pre-Trained Diffusion Models

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (2)