Robust Image Semantic Coding with Learnable CSI Fusion Masking over MIMO Fading Channels

Bingyan Xie; Yongpeng Wu; Yuxuan Shi; Wenjun Zhang; Shuguang Cui; Merouane Debbah

Robust Image Semantic Coding with Learnable CSI Fusion Masking over MIMO Fading Channels

Bingyan Xie, Yongpeng Wu, Yuxuan Shi, Wenjun Zhang, Shuguang Cui, Merouane Debbah

TL;DR

This work tackles robust semantic image transmission over practical MIMO fading channels by introducing CSI fusion into the semantic encoder. The proposed LCFS C framework uses a non-invasive CSI fusion masking mechanism (NI-CFMA) within a Swin Transformer backbone and a learnable mask ratio, complemented by a noise-purified channel estimator (NPN) and an R-CVAE-based conditioning stage to generate suitable masking. Key contributions include: (i) a novel CSI-aware semantic coding paradigm, (ii) a masking-based attention mechanism that preserves essential semantics under fading, (iii) a learnable mask ratio via recurrent conditioning, and (iv) end-to-end training combining reconstruction and ELBO-based objectives. Experimental results on the UDIS-D dataset show that LCFSC consistently outperforms traditional SSCC schemes and state-of-the-art Swin-based semantic frameworks across SNRs and CBRs, indicating strong practical potential for CSI-aware semantic communication in real-world 5G/6G systems.

Abstract

Though achieving marvelous progress in various scenarios, existing semantic communication frameworks mainly consider single-input single-output Gaussian channels or Rayleigh fading channels, neglecting the widely-used multiple-input multiple-output (MIMO) channels, which hinders the application into practical systems. One common solution to combat MIMO fading is to utilize feedback MIMO channel state information (CSI). In this paper, we incorporate MIMO CSI into system designs from a new perspective and propose the learnable CSI fusion semantic communication (LCFSC) framework, where CSI is treated as side information by the semantic extractor to enhance the semantic coding. To avoid feature fusion due to abrupt combination of CSI with features, we present a non-invasive CSI fusion multi-head attention module inside the Swin Transformer. With the learned attention masking map determined by both source and channel states, more robust attention distribution could be generated. Furthermore, the percentage of mask elements could be flexibly adjusted by the learnable mask ratio, which is produced based on the conditional variational interference in an unsupervised manner. In this way, CSI-aware semantic coding is achieved through learnable CSI fusion masking. Experiment results testify the superiority of LCFSC over traditional schemes and state-of-the-art Swin Transformer-based semantic communication frameworks in MIMO fading channels.

Robust Image Semantic Coding with Learnable CSI Fusion Masking over MIMO Fading Channels

TL;DR

Abstract

Paper Structure (30 sections, 24 equations, 15 figures, 2 tables, 1 algorithm)

This paper contains 30 sections, 24 equations, 15 figures, 2 tables, 1 algorithm.

Introduction
System Model and Proposed Framework
System Model
Proposed Framework of CFSC
Detailed Structure of CFSC
Non-invasive CSI Fusion Semantic Extractor
Noise Purified Channel Estimator
Other Network Structure of the JSCC structure
Learnable CSI fusion semantic communication framework
The Recurrent Condition Generation Stage
R-CVAE for Generating the Suitable Conditions
Training Loss
Training Strategy
Numerical Results
Experimental Setups
...and 15 more sections

Figures (15)

Figure 1: Different usages of feedback CSI. (a) and (b): Common semantic communication schemes with CSI feedback. (c): A novel scheme fusing CSI as side information into the semantic encoder.
Figure 2: CSI fusion semantic communication (CFSC) framework.
Figure 3: (a) Network architecture of the CSI fusion semantic encoder $f_{e}$. (b) Two successive Swin Transformer blocks with different attention modules.
Figure 4: Non-invasive CSI fusion multi-head attention module.
Figure 5: The structure of the CSI fusion masking model. Left is the overall network, while right illustrates the attention map masking strategy.
...and 10 more figures

Robust Image Semantic Coding with Learnable CSI Fusion Masking over MIMO Fading Channels

TL;DR

Abstract

Robust Image Semantic Coding with Learnable CSI Fusion Masking over MIMO Fading Channels

Authors

TL;DR

Abstract

Table of Contents

Figures (15)