Table of Contents
Fetching ...

D$^2$-JSCC: Digital Deep Joint Source-channel Coding for Semantic Communications

Jianhao Huang, Kai Yuan, Chuan Huang, Kaibin Huang

TL;DR

The paper introduces D$^2$-JSCC, a digital deep joint source-channel coding framework for SemCom that integrates deep source coding with adaptive density modeling and digital channel coding to minimize end-to-end distortion in image transmission. It derives an analytically tractable E2E distortion bound using Bayesian approximations and Lipschitz properties, revealing a trade-off between source and channel rates under a bandwidth constraint. A practical two-step optimization, combining model selection and channel-aware retraining, achieves near-optimal performance with reduced computational burden and demonstrated improvements over standard deep JSCC and separation-based methods. The approach yields both PSNR and perceptual gains (MS-SSIM) across datasets and demonstrates resilience against cliff and leveling-off effects, indicating strong potential for robust SemCom in dynamic wireless environments.

Abstract

Semantic communications (SemCom) have emerged as a new paradigm for supporting sixth-generation applications, where semantic features of data are transmitted using artificial intelligence algorithms to attain high communication efficiencies. Most existing SemCom techniques utilize deep neural networks (DNNs) to implement analog source-channel mappings, which are incompatible with existing digital communication architectures. To address this issue, this paper proposes a novel framework of digital deep joint source-channel coding (D$^2$-JSCC) targeting image transmission in SemCom. The framework features digital source and channel codings that are jointly optimized to reduce the end-to-end (E2E) distortion. First, deep source coding with an adaptive density model is designed to encode semantic features according to their distributions. Second, digital channel coding is employed to protect encoded features against channel distortion. To facilitate their joint design, the E2E distortion is characterized as a function of the source and channel rates via the analysis of the Bayesian model and Lipschitz assumption on the DNNs. Then to minimize the E2E distortion, a two-step algorithm is proposed to control the source-channel rates for a given channel signal-to-noise ratio. Simulation results reveal that the proposed framework outperforms classic deep JSCC and mitigates the cliff and leveling-off effects, which commonly exist for separation-based approaches.

D$^2$-JSCC: Digital Deep Joint Source-channel Coding for Semantic Communications

TL;DR

The paper introduces D-JSCC, a digital deep joint source-channel coding framework for SemCom that integrates deep source coding with adaptive density modeling and digital channel coding to minimize end-to-end distortion in image transmission. It derives an analytically tractable E2E distortion bound using Bayesian approximations and Lipschitz properties, revealing a trade-off between source and channel rates under a bandwidth constraint. A practical two-step optimization, combining model selection and channel-aware retraining, achieves near-optimal performance with reduced computational burden and demonstrated improvements over standard deep JSCC and separation-based methods. The approach yields both PSNR and perceptual gains (MS-SSIM) across datasets and demonstrates resilience against cliff and leveling-off effects, indicating strong potential for robust SemCom in dynamic wireless environments.

Abstract

Semantic communications (SemCom) have emerged as a new paradigm for supporting sixth-generation applications, where semantic features of data are transmitted using artificial intelligence algorithms to attain high communication efficiencies. Most existing SemCom techniques utilize deep neural networks (DNNs) to implement analog source-channel mappings, which are incompatible with existing digital communication architectures. To address this issue, this paper proposes a novel framework of digital deep joint source-channel coding (D-JSCC) targeting image transmission in SemCom. The framework features digital source and channel codings that are jointly optimized to reduce the end-to-end (E2E) distortion. First, deep source coding with an adaptive density model is designed to encode semantic features according to their distributions. Second, digital channel coding is employed to protect encoded features against channel distortion. To facilitate their joint design, the E2E distortion is characterized as a function of the source and channel rates via the analysis of the Bayesian model and Lipschitz assumption on the DNNs. Then to minimize the E2E distortion, a two-step algorithm is proposed to control the source-channel rates for a given channel signal-to-noise ratio. Simulation results reveal that the proposed framework outperforms classic deep JSCC and mitigates the cliff and leveling-off effects, which commonly exist for separation-based approaches.
Paper Structure (25 sections, 5 theorems, 35 equations, 10 figures, 2 algorithms)

This paper contains 25 sections, 5 theorems, 35 equations, 10 figures, 2 algorithms.

Key Result

Lemma 3.1

Let the block error probability be denoted as $\rho$. There exists a constant $\alpha_{\rho,\bm{\Phi}}\geq1$ w.r.t. $\rho$ and $\bm{\Phi}$, such that the variance of the distorted features $\{\hat{y}_{i}\}$ satisfies where the equality holds when $\rho=0$ and $\alpha_{\rho,\bm{\Phi}}=1$.

Figures (10)

  • Figure 1: Architecture comparison of the JSCC schemes empowered by deep learning techniques: (a) traditional deep JSCC; (b) proposed D$^2$-JSCC. The solid and dashed arrows represent the directions of signal flows and optimization paths, respectively.
  • Figure 2: Architectures of the deep source encoder and decoder using adaptive density model. $\left\lceil \cdot \right\rfloor$, EE, and ED represent the quantization, entropy encoder, and entropy decoder, respectively.
  • Figure 3: PDF of feature element $y_i$ with different NN structures. The DNNs are pretrained and the tested images are randomly cropped into $256\times 256$ pixels.
  • Figure 4: Comparisons of the approximate distortion $\hat{\mathcal{D}}_t$ with the simulation results over different NN models. The NN structure is CNN-based Balle2018. The tested images are randomly cropped into $256\times 256$ pixels and the Bit per pixel (Bpp) is defined as $\frac{R_s}{256*256}$.
  • Figure 5: E2E performance of the D$^2$-JSCC system with the joint model selection and rate control algorithm. The experiments are conduced over Open Image Dataset with random coding, block length $L=512$, and bandwidth ratio being $0.02$.
  • ...and 5 more figures

Theorems & Definitions (7)

  • Lemma 3.1
  • Theorem 3.1
  • Corollary 3.1.1
  • Remark 3.1
  • Lemma 4.1
  • Theorem 4.1
  • Remark 4.1