Table of Contents
Fetching ...

MambaJSCC: Deep Joint Source-Channel Coding with Visual State Space Model

Tong Wu, Zhiyong Chen, Meixia Tao, Xiaodong Xu, Wenjun Zhang, Ping Zhang

TL;DR

MambaJSCC introduces a lightweight deep JSCC framework for image transmission that replaces heavy transformer blocks with a visual state space model backed by a CSI-embedding mechanism. By integrating a VSSM-CA backbone and a shared CSI-encoding module, the approach achieves competitive or superior PSNR under AWGN and Rayleigh channels while reducing parameters, MACs, and inference delay. The architecture uses multi-stage patch processing, patch merging/division, and end-to-end training with a distortion loss, enabling effective global feature extraction with linear complexity. Experimental results on DIV2K show that MambaJSCC with CSI embedding outperforms SwinJSCC variants and offers substantial efficiency gains, making it attractive for semantic communication systems with strict latency and resource constraints.

Abstract

Lightweight and efficient deep joint source-channel coding (JSCC) is a key technology for semantic communications. In this paper, we design a novel JSCC scheme named MambaJSCC, which utilizes a visual state space model with channel adaptation (VSSM-CA) block as its backbone for transmitting images over wireless channels. The VSSM-CA block utilizes VSSM to integrate two-dimensional images with the state space, enabling feature extraction and encoding processes to operate with linear complexity. It also incorporates channel state information (CSI) via a newly proposed CSI embedding method. This method deploys a shared CSI encoding module within both the encoder and decoder to encode and inject the CSI into each VSSM-CA block, improving the adaptability of a single model to varying channel conditions. Experimental results show that MambaJSCC not only outperforms Swin Transformer based JSCC (SwinJSCC) but also significantly reduces parameter size, computational overhead, and inference delay (ID). For example, with employing an equal number of the VSSM-CA blocks and the Swin Transformer blocks, MambaJSCC achieves a 0.48 dB gain in peak-signal-to-noise ratio (PSNR) over SwinJSCC while requiring only 53.3% multiply-accumulate operations, 53.8% of the parameters, and 44.9% of ID.

MambaJSCC: Deep Joint Source-Channel Coding with Visual State Space Model

TL;DR

MambaJSCC introduces a lightweight deep JSCC framework for image transmission that replaces heavy transformer blocks with a visual state space model backed by a CSI-embedding mechanism. By integrating a VSSM-CA backbone and a shared CSI-encoding module, the approach achieves competitive or superior PSNR under AWGN and Rayleigh channels while reducing parameters, MACs, and inference delay. The architecture uses multi-stage patch processing, patch merging/division, and end-to-end training with a distortion loss, enabling effective global feature extraction with linear complexity. Experimental results on DIV2K show that MambaJSCC with CSI embedding outperforms SwinJSCC variants and offers substantial efficiency gains, making it attractive for semantic communication systems with strict latency and resource constraints.

Abstract

Lightweight and efficient deep joint source-channel coding (JSCC) is a key technology for semantic communications. In this paper, we design a novel JSCC scheme named MambaJSCC, which utilizes a visual state space model with channel adaptation (VSSM-CA) block as its backbone for transmitting images over wireless channels. The VSSM-CA block utilizes VSSM to integrate two-dimensional images with the state space, enabling feature extraction and encoding processes to operate with linear complexity. It also incorporates channel state information (CSI) via a newly proposed CSI embedding method. This method deploys a shared CSI encoding module within both the encoder and decoder to encode and inject the CSI into each VSSM-CA block, improving the adaptability of a single model to varying channel conditions. Experimental results show that MambaJSCC not only outperforms Swin Transformer based JSCC (SwinJSCC) but also significantly reduces parameter size, computational overhead, and inference delay (ID). For example, with employing an equal number of the VSSM-CA blocks and the Swin Transformer blocks, MambaJSCC achieves a 0.48 dB gain in peak-signal-to-noise ratio (PSNR) over SwinJSCC while requiring only 53.3% multiply-accumulate operations, 53.8% of the parameters, and 44.9% of ID.
Paper Structure (12 sections, 9 equations, 5 figures, 2 tables, 1 algorithm)

This paper contains 12 sections, 9 equations, 5 figures, 2 tables, 1 algorithm.

Figures (5)

  • Figure 1: (a) The overall architecture of the proposed MambaJSCC. (b) The structure of the VSSM-CA block.
  • Figure 2: The computational flow of V-S6 module.
  • Figure 3: Visual comparison between MambaJSCC w/o CA and SwinJSCC w/o SA&RA with $N_m=N_s=6$ under the AWGN and the Rayleigh fading channel with SNR=$5$ dB.
  • Figure 4: The PSNR performance of MambaJSCC w/o CA and SwinJSCC w/o SA&RA versus SNR under the AWGN and the Rayleigh fading channels. $N_m$ and $N_s$ are set to $6$.
  • Figure 5: The PSNR performance with different channel adaptation methods.