Model-free Reinforcement Learning of Semantic Communication by Stochastic Policy Gradient

Edgar Beck; Carsten Bockelmann; Armin Dekorsy

Model-free Reinforcement Learning of Semantic Communication by Stochastic Policy Gradient

Edgar Beck, Carsten Bockelmann, Armin Dekorsy

TL;DR

This work applies the Stochastic Policy Gradient to design a semantic communication system by reinforcement learning, separating transmitter and receiver, and not requiring a known or differentiable channel model - a crucial step towards deployment in practice.

Abstract

Following the recent success of Machine Learning tools in wireless communications, the idea of semantic communication by Weaver from 1949 has gained attention. It breaks with Shannon's classic design paradigm by aiming to transmit the meaning, i.e., semantics, of a message instead of its exact version, allowing for information rate savings. In this work, we apply the Stochastic Policy Gradient (SPG) to design a semantic communication system by reinforcement learning, separating transmitter and receiver, and not requiring a known or differentiable channel model -- a crucial step towards deployment in practice. Further, we derive the use of SPG for both classic and semantic communication from the maximization of the mutual information between received and target variables. Numerical results show that our approach achieves comparable performance to a model-aware approach based on the reparametrization trick, albeit with a decreased convergence rate.

Model-free Reinforcement Learning of Semantic Communication by Stochastic Policy Gradient

TL;DR

Abstract

Paper Structure (23 sections, 12 equations, 6 figures, 1 table)

This paper contains 23 sections, 12 equations, 6 figures, 1 table.

Introduction
Semantic Communication Framework
Semantic System Model
Semantic Source and Channel
Semantic Channel Encoding
Semantic Communication Design
InfoMax Principle
Information Bottleneck View
Stochastic Policy Gradient-based Reinforcement Learning
Stochastic Gradient Descent-based Optimization
Reinforce Gradient
Reparametrization Trick
Stochastic Policy Gradient
Stochastic Policy
Alternating RL-based Training
...and 8 more sections

Figures (6)

Figure 1: Block diagram of the considered semantic system model.
Figure 2: Optimization procedure of a semantic encoder and decoder without a differentiable channel model: 1. Train the decoder supervised based on the training sequence and updated encoder but without sampler. 2. Encoder explores transmit signals $\bm{\mathsf{x}}_{i}$ and improves its policy according to the decoder reward feedback. 3. Alternate between both steps until convergence.
Figure 3: RL-SINFONY scenario: Four distributed agents extract features for rate-efficient transmission to a decoder that extracts semantics.
Figure 4: Comparison of the classification error rate of RL-SINFONY and SINFONY with $N_{\textrm{Tx}}=14$ on MNIST as a function of normalized SNR.
Figure 5: Comparison of training convergence between RL-SINFONY and SINFONY with $N_{\textrm{Tx}}=14$ in terms of the cross-entropy loss on MNIST averaged over $10$ runs as a function of training epochs $N_{\textrm{e}}$.
...and 1 more figures

Model-free Reinforcement Learning of Semantic Communication by Stochastic Policy Gradient

TL;DR

Abstract

Model-free Reinforcement Learning of Semantic Communication by Stochastic Policy Gradient

Authors

TL;DR

Abstract

Table of Contents

Figures (6)