Table of Contents
Fetching ...

Generative Semantic Communication via Textual Prompts: Latency Performance Tradeoffs

Mengmeng Ren, Li Qiao, Long Yang, Zhen Gao, Jian Chen, Mahdi Boloursaz Mashhadi, Pei Xiao, Rahim Tafazolli, Mehdi Bennis

TL;DR

The paper tackles ultra-low-rate semantic communication by enabling edge-device collaboration to generate textual prompts with pre-trained M/VLMs. It formulates a two-level MINLP that jointly optimizes prompt generation offloading, and communication and computation resources, introducing a CCQ metric to balance latency and semantic quality. An SLJ-based two-level matching algorithm solves the discrete outer problem and the convex inner problem, achieving two-sided stability and reduced sensitivity to initialization. Simulations on image data show substantial gains over semantic-unaware benchmarks, demonstrating significant latency-quality improvements in multi-user edge networks. The approach offers a practical pathway to deploy generative AI-powered Gen SemCom at the network edge with scalable computation and communication tradeoffs.

Abstract

This paper develops an edge-device collaborative Generative Semantic Communications (Gen SemCom) framework leveraging pre-trained Multi-modal/Vision Language Models (M/VLMs) for ultra-low-rate semantic communication via textual prompts. The proposed framework optimizes the use of M/VLMs on the wireless edge/device to generate high-fidelity textual prompts through visual captioning/question answering, which are then transmitted over a wireless channel for SemCom. Specifically, we develop a multi-user Gen SemCom framework using pre-trained M/VLMs, and formulate a joint optimization problem of prompt generation offloading, communication and computation resource allocation to minimize the latency and maximize the resulting semantic quality. Due to the nonconvex nature of the problem with highly coupled discrete and continuous variables, we decompose it as a two-level problem and propose a low-complexity swap/leaving/joining (SLJ)-based matching algorithm. Simulation results demonstrate significant performance improvements over the conventional semanticunaware/non-collaborative offloading benchmarks.

Generative Semantic Communication via Textual Prompts: Latency Performance Tradeoffs

TL;DR

The paper tackles ultra-low-rate semantic communication by enabling edge-device collaboration to generate textual prompts with pre-trained M/VLMs. It formulates a two-level MINLP that jointly optimizes prompt generation offloading, and communication and computation resources, introducing a CCQ metric to balance latency and semantic quality. An SLJ-based two-level matching algorithm solves the discrete outer problem and the convex inner problem, achieving two-sided stability and reduced sensitivity to initialization. Simulations on image data show substantial gains over semantic-unaware benchmarks, demonstrating significant latency-quality improvements in multi-user edge networks. The approach offers a practical pathway to deploy generative AI-powered Gen SemCom at the network edge with scalable computation and communication tradeoffs.

Abstract

This paper develops an edge-device collaborative Generative Semantic Communications (Gen SemCom) framework leveraging pre-trained Multi-modal/Vision Language Models (M/VLMs) for ultra-low-rate semantic communication via textual prompts. The proposed framework optimizes the use of M/VLMs on the wireless edge/device to generate high-fidelity textual prompts through visual captioning/question answering, which are then transmitted over a wireless channel for SemCom. Specifically, we develop a multi-user Gen SemCom framework using pre-trained M/VLMs, and formulate a joint optimization problem of prompt generation offloading, communication and computation resource allocation to minimize the latency and maximize the resulting semantic quality. Due to the nonconvex nature of the problem with highly coupled discrete and continuous variables, we decompose it as a two-level problem and propose a low-complexity swap/leaving/joining (SLJ)-based matching algorithm. Simulation results demonstrate significant performance improvements over the conventional semanticunaware/non-collaborative offloading benchmarks.
Paper Structure (11 sections, 4 equations, 3 figures, 1 table)

This paper contains 11 sections, 4 equations, 3 figures, 1 table.

Figures (3)

  • Figure 1: Edge-device collaborative Gen SemCom via textual prompts in a typical application, i.e. human-supervised control of tele-operations: (a) on-device prompt generation; (b) offloaded prompt generation.
  • Figure 2: The maximal CCQ and the number of offloaded generations among the proposed framework, FOPG, FODPG, and SUO versus $f_{n,\max}^L$ at transmitters, where (a) prompt length is $400$ bits, and (b) prompt length is $600$ bits.
  • Figure 3: (a) The normalized CIDEr/latency of T-R pair with maximal CCQ of the proposed framework and other schemes, where prompt length is $400$ bits. (b) The maximal CCQ of the proposed framework, ES, and CSA versus $f_{n,\max}^L$ at transmitters, where prompt length is $600$ bits.