Table of Contents
Fetching ...

Diffusion-Aided Bandwidth-Efficient Semantic Communication with Adaptive Requests

Xuesong Wang, Xinyan Xie, Mo Li, Zhaoqian Liu

TL;DR

The paper tackles bandwidth-efficient semantic image transmission by moving beyond pixel-perfect reconstruction to preserving meaning. It proposes a two-stage approach that first transmits a short caption and a sparse set of latent blocks, then uses a caption-conditioned diffusion inpainting at the receiver to fill in missing content; a receiver-driven ROUGE-L semantic criterion triggers retransmissions when necessary. The main contributions are a receiver-driven semantic acknowledgment mechanism for adaptive latent-block requests and a diffusion-based inpainting framework that fuses sparse latent evidence with text guidance to maintain semantic fidelity at reduced bitrate. Empirical results on Flickr30k show substantial bandwidth savings over pixel-centric and full-latent baselines while maintaining strong semantic alignment (ROUGE-L) and competitive perceptual quality, with a tunable rate-accuracy trade-off via the semantic threshold $\tau$ and block granularity $l$.

Abstract

Semantic communication focuses on conveying the intrinsic meaning of data rather than its raw symbolic representation. For visual content, this paradigm shifts from traditional pixel-level transmission toward leveraging the semantic structure of images to communicate visual meaning. Existing approaches are dominated by two routes: using text-only descriptions, which typically under-specify spatial layout and fine-grained appearance details; or transmitting text alongside dense latent visual features, which can over-specify semantics and introduce redundancy and bitrate overhead. A key challenge, therefore, is to reduce semantic redundancy while preserving semantic understanding and visual fidelity, thereby improving overall transmission efficiency. This paper introduces a diffusion-based semantic communication framework with adaptive retransmission. The system transmits concise text descriptions together with a limited set of key latent visual features, and employs a diffusion-based inpainting model to reconstruct the image. A receiver-side semantic consistency mechanism is designed to evaluate the alignment between the reconstructed image and the original text description. When a semantic discrepancy is detected, the receiver triggers a retransmission to request a small set of additional latent blocks and refine the image reconstruction. This approach significantly reduces bandwidth usage while preserving high semantic accuracy, achieving an efficient balance between reconstruction quality and transmission cost.

Diffusion-Aided Bandwidth-Efficient Semantic Communication with Adaptive Requests

TL;DR

The paper tackles bandwidth-efficient semantic image transmission by moving beyond pixel-perfect reconstruction to preserving meaning. It proposes a two-stage approach that first transmits a short caption and a sparse set of latent blocks, then uses a caption-conditioned diffusion inpainting at the receiver to fill in missing content; a receiver-driven ROUGE-L semantic criterion triggers retransmissions when necessary. The main contributions are a receiver-driven semantic acknowledgment mechanism for adaptive latent-block requests and a diffusion-based inpainting framework that fuses sparse latent evidence with text guidance to maintain semantic fidelity at reduced bitrate. Empirical results on Flickr30k show substantial bandwidth savings over pixel-centric and full-latent baselines while maintaining strong semantic alignment (ROUGE-L) and competitive perceptual quality, with a tunable rate-accuracy trade-off via the semantic threshold and block granularity .

Abstract

Semantic communication focuses on conveying the intrinsic meaning of data rather than its raw symbolic representation. For visual content, this paradigm shifts from traditional pixel-level transmission toward leveraging the semantic structure of images to communicate visual meaning. Existing approaches are dominated by two routes: using text-only descriptions, which typically under-specify spatial layout and fine-grained appearance details; or transmitting text alongside dense latent visual features, which can over-specify semantics and introduce redundancy and bitrate overhead. A key challenge, therefore, is to reduce semantic redundancy while preserving semantic understanding and visual fidelity, thereby improving overall transmission efficiency. This paper introduces a diffusion-based semantic communication framework with adaptive retransmission. The system transmits concise text descriptions together with a limited set of key latent visual features, and employs a diffusion-based inpainting model to reconstruct the image. A receiver-side semantic consistency mechanism is designed to evaluate the alignment between the reconstructed image and the original text description. When a semantic discrepancy is detected, the receiver triggers a retransmission to request a small set of additional latent blocks and refine the image reconstruction. This approach significantly reduces bandwidth usage while preserving high semantic accuracy, achieving an efficient balance between reconstruction quality and transmission cost.

Paper Structure

This paper contains 9 sections, 15 equations, 4 figures, 1 table, 1 algorithm.

Figures (4)

  • Figure 1: Proposed system model.
  • Figure 2: System performance over different $\tau$ values. (a) Total transmission rounds $t$. (b) Compression ratio $\kappa$. (c) Round-wise histogram.
  • Figure 3: System average performance. (a)(b)(c)(d) metrics over SNR values at $\tau=0.7$. (e)(f) FID performance over SNR and $\tau$ values.
  • Figure 4: Reconstructions at termination for sessions with final compression ratio (a) $\kappa=0.125$, (b) $\kappa=0.25$, (c) $\kappa=0.375$, under SNR $7$ dB, $l=4$ and $\tau=0.7$. Columns from left to right: original images, full-mask scheme, no-guidance scheme, main scheme.