Table of Contents
Fetching ...

Robust Deep Joint Source Channel Coding for Task-Oriented Semantic Communications

Taewoo Park, Eunhye Hong, Yo-Seb Jeon, Namyoon Lee, Yongjune Kim

TL;DR

The paper tackles robustness in task-oriented semantic communications by addressing channel-induced stochasticity in deep JSCC. It introduces a KL-divergence based regularizer that makes the noisy posterior approximate the noise-free posterior, and derives a tractable form $\mathcal{R} = \frac{\sigma^2}{2} \mathrm{Tr}(I(z))$ via a Fisher information based Taylor expansion. This regularizer smooths the log-posterior curvature and adapts to channel conditions, all while being architecture-agnostic and not increasing inference complexity. Empirical results across analog and digital JSCC setups (under AWGN and fading channels) show consistent improvements in task accuracy, especially when training and testing channel conditions differ. The approach offers a practical boost to reliable, task-oriented semantic communications with potential broad applicability to posterior-dependent systems.

Abstract

Semantic communications based on deep joint source-channel coding (JSCC) aim to improve communication efficiency by transmitting only task-relevant information. However, ensuring robustness to the stochasticity of communication channels remains a key challenge in learning-based JSCC. In this paper, we propose a novel regularization technique for learning-based JSCC to enhance robustness against channel noise. The proposed method utilizes the Kullback-Leibler (KL) divergence as a regularizer term in the training loss, measuring the discrepancy between two posterior distributions: one under noisy channel conditions (noisy posterior) and one for a noise-free system (noise-free posterior). Reducing this KL divergence mitigates the impact of channel noise on task performance by keeping the noisy posterior close to the noise-free posterior. We further show that the expectation of the KL divergence given the encoded representation can be analytically approximated using the Fisher information matrix and the covariance matrix of the channel noise. Notably, the proposed regularization is architecture-agnostic, making it broadly applicable to general semantic communication systems over noisy channels. Our experimental results validate that the proposed regularization consistently improves task performance across diverse semantic communication systems and channel conditions.

Robust Deep Joint Source Channel Coding for Task-Oriented Semantic Communications

TL;DR

The paper tackles robustness in task-oriented semantic communications by addressing channel-induced stochasticity in deep JSCC. It introduces a KL-divergence based regularizer that makes the noisy posterior approximate the noise-free posterior, and derives a tractable form via a Fisher information based Taylor expansion. This regularizer smooths the log-posterior curvature and adapts to channel conditions, all while being architecture-agnostic and not increasing inference complexity. Empirical results across analog and digital JSCC setups (under AWGN and fading channels) show consistent improvements in task accuracy, especially when training and testing channel conditions differ. The approach offers a practical boost to reliable, task-oriented semantic communications with potential broad applicability to posterior-dependent systems.

Abstract

Semantic communications based on deep joint source-channel coding (JSCC) aim to improve communication efficiency by transmitting only task-relevant information. However, ensuring robustness to the stochasticity of communication channels remains a key challenge in learning-based JSCC. In this paper, we propose a novel regularization technique for learning-based JSCC to enhance robustness against channel noise. The proposed method utilizes the Kullback-Leibler (KL) divergence as a regularizer term in the training loss, measuring the discrepancy between two posterior distributions: one under noisy channel conditions (noisy posterior) and one for a noise-free system (noise-free posterior). Reducing this KL divergence mitigates the impact of channel noise on task performance by keeping the noisy posterior close to the noise-free posterior. We further show that the expectation of the KL divergence given the encoded representation can be analytically approximated using the Fisher information matrix and the covariance matrix of the channel noise. Notably, the proposed regularization is architecture-agnostic, making it broadly applicable to general semantic communication systems over noisy channels. Our experimental results validate that the proposed regularization consistently improves task performance across diverse semantic communication systems and channel conditions.

Paper Structure

This paper contains 14 sections, 2 theorems, 21 equations, 12 figures.

Key Result

Proposition 1

By treating the KL divergence eq:KLdiv as a function of $\widehat{\mathbf{z}}$ and applying the second order Taylor approximation around $\mathbf{z}$, the KL divergence can be approximated as where $\mathcal{I}(\mathbf{z})$ is the Fisher information matrix defined as

Figures (12)

  • Figure 1: The system model of learning-based JSCC schemes for classification tasks. The input image data is denoted as $\mathbf{x}$, the encoded representation as $\mathbf{z}$, the channel noise as $\mathbf{n}$, the noisy received representation as $\widehat{\mathbf{z}}$, and the estimated label as $\widehat{\mathbf{y}}$. The encoder maps the image $\mathbf{x}$ to the representation $\mathbf{z}$ based on the conditional probability distribution $p_{\boldsymbol{\phi}}(\mathbf{z}|\mathbf{x})$, while the decoder estimates the label $\widehat{\mathbf{y}}$ from the received representation $\widehat{\mathbf{z}}$ following $q_{\boldsymbol{\theta}}(\mathbf y|\widehat{\mathbf{z}})$. Both $p_{\boldsymbol{\phi}}(\mathbf{z}|\mathbf{x})$ and $q_{\boldsymbol{\theta}}(\mathbf y|\widehat{\mathbf{z}})$ are parametrized by neural networks, which function as the encoder and decoder, respectively.
  • Figure 2: Comparison of the log posterior curvature smoothness under different channel conditions. (a) illustrates the smoothing effect of the proposed regularization under the good channel condition, while (b) shows the smoothing under the poor channel condition. $\theta_0$ denotes the parameter optimized without regularization, while $\theta_1$ and $\theta_2$ denote parameters optimized with regularization for small and large $\sigma^2$, respectively.
  • Figure 3: Visualization of negative log posterior for a true label, comparing the conventional method and the proposed method. DeepJSCC Bourtsoulatze2019deep is chosen to be a baseline JSCC model for a classification task, implemented with the architecture from Shao2022learningShao2022vl-vfe and trained on the CIFAR-10 dataset Krizhevsky2009learning. Each row corresponds to results obatined from the same input image. (a) shows the negative log posterior for DeepJSCC, and (b) shows the negative log posterior for the proposed model.
  • Figure 4: Comparison of CIFAR-10 classification error rates between DeepJSCC and the proposed method across PSNRs ranging from 5dB to 25dB. Models in (a), (b), and (c) are trained with fixed channel PSNRs of 10dB, 15dB, and 20dB, respectively.
  • Figure 5: Comparison of CIFAR-100 classification error rates between DeepJSCC and the proposed method across PSNRs ranging from 5dB to 25dB. Models in (a), (b), and (c) are trained with fixed channel PSNRs of 10dB, 15dB, and 20dB, respectively.
  • ...and 7 more figures

Theorems & Definitions (2)

  • Proposition 1
  • Proposition 2