Table of Contents
Fetching ...

Cache-enabled Generative Joint Source-Channel Coding for Evolving Semantic Communications

Shunpu Tang, Qianqian Yang, Jihong Park, Zhaoyang Zhang, Kaibin Huang, Deniz Gunduz

Abstract

Learning-based semantic communication (SemCom) has recently emerged as a promising paradigm for improving the transmission efficiency of wireless networks. However, existing methods typically rely on extensive end-to-end training, which is both inflexible and computationally expensive in dynamic wireless environments. Moreover, they fail to exploit redundancy across multiple transmissions of semantically similar content, limiting overall efficiency. To overcome these limitations, we propose a channel-aware generative adversarial network (GAN) inversion-based joint source-channel coding (CAGI-JSCC) framework that enables training-free SemCom by leveraging a pre-trained SemanticStyleGAN model. By explicitly incorporating wireless channel characteristics into the GAN inversion process, CAGI-JSCC adapts to varying channel conditions without additional training. Furthermore, we introduce a cache-enabled dynamic codebook (CDC) that caches disentangled semantic components at both the transmitter and receiver, allowing the system to reuse previously transmitted content. This semantic-level caching can continuously reduce redundant transmissions as experience accumulates. Extensive experiments on image transmission demonstrate the effectiveness of the proposed framework. In particular, our system achieves comparable perceptual quality with an average bandwidth compression ratio (BCR) of 1/224, and as low as 1/1024 for a single image, significantly outperforming baselines with a BCR of 1/128.

Cache-enabled Generative Joint Source-Channel Coding for Evolving Semantic Communications

Abstract

Learning-based semantic communication (SemCom) has recently emerged as a promising paradigm for improving the transmission efficiency of wireless networks. However, existing methods typically rely on extensive end-to-end training, which is both inflexible and computationally expensive in dynamic wireless environments. Moreover, they fail to exploit redundancy across multiple transmissions of semantically similar content, limiting overall efficiency. To overcome these limitations, we propose a channel-aware generative adversarial network (GAN) inversion-based joint source-channel coding (CAGI-JSCC) framework that enables training-free SemCom by leveraging a pre-trained SemanticStyleGAN model. By explicitly incorporating wireless channel characteristics into the GAN inversion process, CAGI-JSCC adapts to varying channel conditions without additional training. Furthermore, we introduce a cache-enabled dynamic codebook (CDC) that caches disentangled semantic components at both the transmitter and receiver, allowing the system to reuse previously transmitted content. This semantic-level caching can continuously reduce redundant transmissions as experience accumulates. Extensive experiments on image transmission demonstrate the effectiveness of the proposed framework. In particular, our system achieves comparable perceptual quality with an average bandwidth compression ratio (BCR) of 1/224, and as low as 1/1024 for a single image, significantly outperforming baselines with a BCR of 1/128.
Paper Structure (24 sections, 21 equations, 7 figures, 3 tables, 1 algorithm)

This paper contains 24 sections, 21 equations, 7 figures, 3 tables, 1 algorithm.

Figures (7)

  • Figure 1: Illustration of the proposed channel-aware GAN inversion approach, where the channel characteristics are incorporated into the GAN inversion process to generate channel-adaptive semantic information. The derived semantic information then can be directly transmitted over noisy channels without explicit channel coding.
  • Figure 2: Illustration of the proposed CDC design, which deploys cache memories at both the transmitter and receiver to store previously transmitted semantic information and serve as a dynamic codebook.
  • Figure 3: Performance comparison of the proposed CAGI-JSCC approach with various baseline methods in terms of PSNR, MS-SSIM, LPIPS, PIEAPP, DISTS, and FID versus SNR over the AWGN channel, where the BCR is set to $\rho=1/128$ and SNR varies from 0 dB to 5 dB.
  • Figure 4: Performance of the proposed CAGI-JSCC approach with imperfect SNR knowledge at the transmitter, where the BCR is set to $\rho=1/128$ and real SNR varies from 0 dB to 5 dB.
  • Figure 5: Visual comparison of different methods under SNR = 3 dB.
  • ...and 2 more figures