Generative Semantic Communication for Text-to-Speech Synthesis

Jiahao Zheng; Jinke Ren; Peng Xu; Zhihao Yuan; Jie Xu; Fangxin Wang; Gui Gui; Shuguang Cui

Generative Semantic Communication for Text-to-Speech Synthesis

Jiahao Zheng, Jinke Ren, Peng Xu, Zhihao Yuan, Jie Xu, Fangxin Wang, Gui Gui, Shuguang Cui

TL;DR

A novel generative semantic communication framework for TTS synthesis, leveraging generative artificial intelligence technologies and employing a transformer encoder and a diffusion model to achieve efficient semantic coding without introducing significant communication overhead.

Abstract

Semantic communication is a promising technology to improve communication efficiency by transmitting only the semantic information of the source data. However, traditional semantic communication methods primarily focus on data reconstruction tasks, which may not be efficient for emerging generative tasks such as text-to-speech (TTS) synthesis. To address this limitation, this paper develops a novel generative semantic communication framework for TTS synthesis, leveraging generative artificial intelligence technologies. Firstly, we utilize a pre-trained large speech model called WavLM and the residual vector quantization method to construct two semantic knowledge bases (KBs) at the transmitter and receiver, respectively. The KB at the transmitter enables effective semantic extraction, while the KB at the receiver facilitates lifelike speech synthesis. Then, we employ a transformer encoder and a diffusion model to achieve efficient semantic coding without introducing significant communication overhead. Finally, numerical results demonstrate that our framework achieves much higher fidelity for the generated speech than four baselines, in both cases with additive white Gaussian noise channel and Rayleigh fading channel.

Generative Semantic Communication for Text-to-Speech Synthesis

TL;DR

Abstract

Paper Structure (11 sections, 12 equations, 6 figures, 2 tables, 1 algorithm)

This paper contains 11 sections, 12 equations, 6 figures, 2 tables, 1 algorithm.

Introduction
System Model
Semantic KBs and Semantic Decoder
Design of Semantic KBs
Design of Semantic Decoder
Two-stage Training Algorithm
Experimental Results
Experiment Settings
Performance Comparison with Different Channels
Performance Comparison with Different Communication Budgets
Conclusion

Figures (6)

Figure 1: Generative semantic communication system for text-to-speech synthesis.
Figure 2: Illustration of the semantic KBs.
Figure 3: Illustration of the prior encoder.
Figure 4: WER and SPK of different schemes with AWGN channel.
Figure 5: WER and SPK of different schemes with Rayleigh fading channel.
...and 1 more figures

Generative Semantic Communication for Text-to-Speech Synthesis

TL;DR

Abstract

Generative Semantic Communication for Text-to-Speech Synthesis

Authors

TL;DR

Abstract

Table of Contents

Figures (6)