Table of Contents
Fetching ...

Variational Source-Channel Coding for Semantic Communication

Yulong Feng, Jing Xu, Liujun Hu, Guanghui Yu, Xiangyang Duan

TL;DR

This paper reframes semantic communication as a rate-distortion problem and argues that joint source-channel coding (JSCC) is necessary for optimal semantic transmission. It introduces Variational Source-Channel Coding (VSCC), which embeds channel effects into the encoder through variational inference and a channel-matching objective, enabling the latent distribution to adapt to channel conditions. The authors implement a ResNet/Attention-based architecture and compare VSCC with VAE and AE on Mini-ImageNet, showing improved semantic fidelity (via SSIM) and clearer interpretation of semantic features as latent-variance, while noting that AE maintains best data-recovery performance. The work demonstrates that the channel can be treated as part of the joint encoder, with a tunable channel-matching coefficient (CMC) guiding how much distortion to tolerate under different SNRs, and outlines avenues for future improvements in semantic metrics and diffusion-based enhancements.

Abstract

Semantic communication technology emerges as a pivotal bridge connecting AI with classical communication. The current semantic communication systems are generally modeled as an Auto-Encoder (AE). AE lacks a deep integration of AI principles with communication strategies due to its inability to effectively capture channel dynamics. This gap makes it difficult to justify the need for joint source-channel coding (JSCC) and to explain why performance improves. This paper begins by exploring lossless and lossy communication, highlighting that the inclusion of data distortion distinguishes semantic communication from classical communication. It breaks the conditions for the separation theorem to hold and explains why the amount of data transferred by semantic communication is less. Therefore, employing JSCC becomes imperative for achieving optimal semantic communication. Moreover, a Variational Source-Channel Coding (VSCC) method is proposed for constructing semantic communication systems based on data distortion theory, integrating variational inference and channel characteristics. Using a deep learning network, we develop a semantic communication system employing the VSCC method and demonstrate its capability for semantic transmission. We also establish semantic communication systems of equivalent complexity employing the AE method and the VAE method. Experimental results reveal that the VSCC model offers superior interpretability compared to AE model, as it clearly captures the semantic features of the transmitted data, represented as the variance of latent variables in our experiments. In addition, VSCC model exhibits superior semantic transmission capabilities compared to VAE model. At the same level of data distortion evaluated by PSNR, VSCC model exhibits stronger human interpretability, which can be partially assessed by SSIM.

Variational Source-Channel Coding for Semantic Communication

TL;DR

This paper reframes semantic communication as a rate-distortion problem and argues that joint source-channel coding (JSCC) is necessary for optimal semantic transmission. It introduces Variational Source-Channel Coding (VSCC), which embeds channel effects into the encoder through variational inference and a channel-matching objective, enabling the latent distribution to adapt to channel conditions. The authors implement a ResNet/Attention-based architecture and compare VSCC with VAE and AE on Mini-ImageNet, showing improved semantic fidelity (via SSIM) and clearer interpretation of semantic features as latent-variance, while noting that AE maintains best data-recovery performance. The work demonstrates that the channel can be treated as part of the joint encoder, with a tunable channel-matching coefficient (CMC) guiding how much distortion to tolerate under different SNRs, and outlines avenues for future improvements in semantic metrics and diffusion-based enhancements.

Abstract

Semantic communication technology emerges as a pivotal bridge connecting AI with classical communication. The current semantic communication systems are generally modeled as an Auto-Encoder (AE). AE lacks a deep integration of AI principles with communication strategies due to its inability to effectively capture channel dynamics. This gap makes it difficult to justify the need for joint source-channel coding (JSCC) and to explain why performance improves. This paper begins by exploring lossless and lossy communication, highlighting that the inclusion of data distortion distinguishes semantic communication from classical communication. It breaks the conditions for the separation theorem to hold and explains why the amount of data transferred by semantic communication is less. Therefore, employing JSCC becomes imperative for achieving optimal semantic communication. Moreover, a Variational Source-Channel Coding (VSCC) method is proposed for constructing semantic communication systems based on data distortion theory, integrating variational inference and channel characteristics. Using a deep learning network, we develop a semantic communication system employing the VSCC method and demonstrate its capability for semantic transmission. We also establish semantic communication systems of equivalent complexity employing the AE method and the VAE method. Experimental results reveal that the VSCC model offers superior interpretability compared to AE model, as it clearly captures the semantic features of the transmitted data, represented as the variance of latent variables in our experiments. In addition, VSCC model exhibits superior semantic transmission capabilities compared to VAE model. At the same level of data distortion evaluated by PSNR, VSCC model exhibits stronger human interpretability, which can be partially assessed by SSIM.

Paper Structure

This paper contains 19 sections, 20 equations, 9 figures, 2 algorithms.

Figures (9)

  • Figure 1: Communication model based on classical communication theory. Basically, it consists of a source module, an encoding module, a channel module, a decoding module, and a destination module. The encoding module is divided into source encoding and channel encoding, which are guaranteed by Shannon's source coding theorem, Shannon's channel coding theorem, and the source-channel separation theorem.
  • Figure 2: Semantic communication model. It is achieved through theoretical analysis in the semantic space and implementation in the data space. The key distinction between the two spaces lies in whether the causality of the physical world is considered. The data space communication model is similar to the classical communication model but utilizes a joint encoding module and a decoding module. The semantic space is primarily defined by a knowledge base (KB), which governs how data is mapped to the semantic space and aids in the encoding and decoding process. The channel that aligns the KBs can be viewed as the semantic channel.
  • Figure 3: Diagram illustrating the mathematical reasoning of the VSCC method. The purpose of this method is to make the received data distribution $q_x(x)$ as same as the original distribution $p_x(x)$ as possible. The original data $x$ is first passed through the VSCC encoder to obtain the encoding vector $y$. The edcoded vector $y$ passes through the channel to obtain the hidden variable $z$. $z$ stands for feature distributions that can be combined into the original data distribution. The receiver resamples $\hat{y}$ from the hidden variable $z$ and then decodes it to the data $\hat{x}$.
  • Figure 4: Geometric interpretation of the Variational Source-Channel Coding (VSCC) method. The raw data $x$ is a single sample from the original data distribution. It can be encoded by VSCC encoder and the channel into the latent variable $z$. Each feature distribution represented by $z$ corresponds to a cluster of $x$, forming distinct regions. By resampling different feature distributions, the original data distribution can be reconstructed and represented by $\hat{x}$.
  • Figure 5: The structure of the semantic communication model, constructed using Residual Blocks and Attention Blocks. The model can be trained using Auto-Encoder (AE), Variational AE (VAE), or Variational Source-Channel Coding (VSCC) methods. In the joint encoder, the image $X$ is first normalized and processed through a Convolutional Neural Network (CNN) layer to increase the feature dimension to 32, followed by a sequence of four Residual Blocks and Attention Blocks to extract features across various dimensions. Additional Residual and Attention Blocks finalize the feature dimensions before the data is compressed with a Group Normal (GN) layer, Swish activation function, and two CNN layers. The feature dimension $k$ of encoded vector $Y$ is set as 16. In the decoder, the received latent variable $Z$ is resampled using a reparameterization module and decoded into $\hat{X}$ by reversing the steps of the joint encoder. The Residual Block structure (top right) includes a GN layer, Swish activation function, and a CNN layer to enhance input features, while the Attention Block structure (bottom right) computes the output using the attention mechanism with inputs $Q$, $K$, and $V$ produced by CNN layers.
  • ...and 4 more figures