Table of Contents
Fetching ...

Semantic MIMO Systems for Speech-to-Text Transmission

Zhenzi Weng, Zhijin Qin, Huiqiang Xie, Xiaoming Tao, Khaled B. Letaief

TL;DR

The paper tackles semantic communications for speech-to-text over MIMO channels by introducing SAC-ST, a transformer-based semantic encoder with a semantic-aware network that prioritizes meaning-rich content on high-SNR subchannels. It extends this framework with a neural channel-estimation network (ChanEst) to reduce dependence on perfect CSI, enabling practical deployment in SU-MIMO and MU-MIMO settings. The approach uses a two-stage training regime for DeepSC-ST 2.0 and a separate SA module, with loss functions ${\mathcal L}_{CTC}$ and ${\mathcal L}_{SA}$ to optimize semantic fidelity; the SU-MIMO testing stage sorts semantics by importance before transmission. Numerical results on LibriSpeech in Rayleigh fading show substantial gains in WER and sentence similarity, particularly at low SNRs, and ChanEst-based SAC-ST achieves performance close to the perfect CSI benchmark, highlighting practical viability for semantically guided speech transmission over MIMO. The work also derives a clear path for extending semantic MIMO to multilingual and more complex tasks, addressing robustness and fairness across users.

Abstract

Semantic communications have been utilized to execute numerous intelligent tasks by transmitting task-related semantic information instead of bits. In this article, we propose a semantic-aware speech-to-text transmission system for the single-user multiple-input multiple-output (MIMO) and multi-user MIMO communication scenarios, named SAC-ST. Particularly, a semantic communication system to serve the speech-to-text task at the receiver is first designed, which compresses the semantic information and generates the low-dimensional semantic features by leveraging the transformer module. In addition, a novel semantic-aware network is proposed to facilitate transmission with high semantic fidelity by identifying the critical semantic information and guaranteeing its accurate recovery. Furthermore, we extend the SAC-ST with a neural network-enabled channel estimation network to mitigate the dependence on accurate channel state information and validate the feasibility of SAC-ST in practical communication environments. Simulation results will show that the proposed SAC-ST outperforms the communication framework without the semantic-aware network for speech-to-text transmission over the MIMO channels in terms of the speech-to-text metrics, especially in the low signal-to-noise regime. Moreover, the SAC-ST with the developed channel estimation network is comparable to the SAC-ST with perfect channel state information.

Semantic MIMO Systems for Speech-to-Text Transmission

TL;DR

The paper tackles semantic communications for speech-to-text over MIMO channels by introducing SAC-ST, a transformer-based semantic encoder with a semantic-aware network that prioritizes meaning-rich content on high-SNR subchannels. It extends this framework with a neural channel-estimation network (ChanEst) to reduce dependence on perfect CSI, enabling practical deployment in SU-MIMO and MU-MIMO settings. The approach uses a two-stage training regime for DeepSC-ST 2.0 and a separate SA module, with loss functions and to optimize semantic fidelity; the SU-MIMO testing stage sorts semantics by importance before transmission. Numerical results on LibriSpeech in Rayleigh fading show substantial gains in WER and sentence similarity, particularly at low SNRs, and ChanEst-based SAC-ST achieves performance close to the perfect CSI benchmark, highlighting practical viability for semantically guided speech transmission over MIMO. The work also derives a clear path for extending semantic MIMO to multilingual and more complex tasks, addressing robustness and fairness across users.

Abstract

Semantic communications have been utilized to execute numerous intelligent tasks by transmitting task-related semantic information instead of bits. In this article, we propose a semantic-aware speech-to-text transmission system for the single-user multiple-input multiple-output (MIMO) and multi-user MIMO communication scenarios, named SAC-ST. Particularly, a semantic communication system to serve the speech-to-text task at the receiver is first designed, which compresses the semantic information and generates the low-dimensional semantic features by leveraging the transformer module. In addition, a novel semantic-aware network is proposed to facilitate transmission with high semantic fidelity by identifying the critical semantic information and guaranteeing its accurate recovery. Furthermore, we extend the SAC-ST with a neural network-enabled channel estimation network to mitigate the dependence on accurate channel state information and validate the feasibility of SAC-ST in practical communication environments. Simulation results will show that the proposed SAC-ST outperforms the communication framework without the semantic-aware network for speech-to-text transmission over the MIMO channels in terms of the speech-to-text metrics, especially in the low signal-to-noise regime. Moreover, the SAC-ST with the developed channel estimation network is comparable to the SAC-ST with perfect channel state information.
Paper Structure (16 sections, 21 equations, 23 figures, 6 tables, 4 algorithms)

This paper contains 16 sections, 21 equations, 23 figures, 6 tables, 4 algorithms.

Figures (23)

  • Figure 1: Conventional SU-MIMO communication system with SVD precoding.
  • Figure 2: Conventional communication system
  • Figure 3: Semanitc communication system
  • Figure 5:
  • Figure 6:
  • ...and 18 more figures