Table of Contents
Fetching ...

Structure-Enhanced DRL for Optimal Transmission Scheduling

Jiazheng Chen, Wanchun Liu, Daniel E. Quevedo, Saeed R. Khosravirad, Yonghui Li, Branka Vucetic

TL;DR

A structure-enhanced deep reinforcement learning framework for optimal scheduling of the remote estimation system to achieve the minimum overall estimation mean-square error (MSE), and proposes a structure-enhanced action selection method, which tends to select actions that obey the policy structure.

Abstract

Remote state estimation of large-scale distributed dynamic processes plays an important role in Industry 4.0 applications. In this paper, we focus on the transmission scheduling problem of a remote estimation system. First, we derive some structural properties of the optimal sensor scheduling policy over fading channels. Then, building on these theoretical guidelines, we develop a structure-enhanced deep reinforcement learning (DRL) framework for optimal scheduling of the system to achieve the minimum overall estimation mean-square error (MSE). In particular, we propose a structure-enhanced action selection method, which tends to select actions that obey the policy structure. This explores the action space more effectively and enhances the learning efficiency of DRL agents. Furthermore, we introduce a structure-enhanced loss function to add penalties to actions that do not follow the policy structure. The new loss function guides the DRL to converge to the optimal policy structure quickly. Our numerical experiments illustrate that the proposed structure-enhanced DRL algorithms can save the training time by 50% and reduce the remote estimation MSE by 10% to 25% when compared to benchmark DRL algorithms. In addition, we show that the derived structural properties exist in a wide range of dynamic scheduling problems that go beyond remote state estimation.

Structure-Enhanced DRL for Optimal Transmission Scheduling

TL;DR

A structure-enhanced deep reinforcement learning framework for optimal scheduling of the remote estimation system to achieve the minimum overall estimation mean-square error (MSE), and proposes a structure-enhanced action selection method, which tends to select actions that obey the policy structure.

Abstract

Remote state estimation of large-scale distributed dynamic processes plays an important role in Industry 4.0 applications. In this paper, we focus on the transmission scheduling problem of a remote estimation system. First, we derive some structural properties of the optimal sensor scheduling policy over fading channels. Then, building on these theoretical guidelines, we develop a structure-enhanced deep reinforcement learning (DRL) framework for optimal scheduling of the system to achieve the minimum overall estimation mean-square error (MSE). In particular, we propose a structure-enhanced action selection method, which tends to select actions that obey the policy structure. This explores the action space more effectively and enhances the learning efficiency of DRL agents. Furthermore, we introduce a structure-enhanced loss function to add penalties to actions that do not follow the policy structure. The new loss function guides the DRL to converge to the optimal policy structure quickly. Our numerical experiments illustrate that the proposed structure-enhanced DRL algorithms can save the training time by 50% and reduce the remote estimation MSE by 10% to 25% when compared to benchmark DRL algorithms. In addition, we show that the derived structural properties exist in a wide range of dynamic scheduling problems that go beyond remote state estimation.
Paper Structure (29 sections, 9 theorems, 77 equations, 8 figures, 2 tables, 2 algorithms)

This paper contains 29 sections, 9 theorems, 77 equations, 8 figures, 2 tables, 2 algorithms.

Key Result

Lemma 1

If the optimal policy exists, then the operator $\mathsf{B}$ has a unique fixed point $V^{*} \in \mathcal{V}$ and for all $V^{0} \in \mathcal{V}$, the sequence $\{V^{\tilde{t}}\}$ defined by $V^{\tilde{t}+1} = \mathsf{B} [V^{\tilde{t}}]$ converges in norm to $V^{*}$, i.e.

Figures (8)

  • Figure 1: Remote state estimation system with $N$ processes and $M$ channels.
  • Figure 2: Structure of the optimal scheduling policy with $N=2$ and $M=1$, where $\bullet$ and $\times$ represent the schedule of sensor 1 and 2, respectively.
  • Figure 3: The optimal policy of a two-sensor-single-channel scheduling problem with the multiplicative reward function, where $\bullet$ and $\times$ represent the schedule of sensor 1 and 2, respectively.
  • Figure 4: Average sum MSE of all processes during training with $N=6, M=3$.
  • Figure 5: Average sum MSE of all processes during training with $N\!=10, M\!=5$.
  • ...and 3 more figures

Theorems & Definitions (23)

  • Definition 1: Channel-State Threshold Policy
  • Definition 2: AoI-State Threshold Policy
  • Lemma 1: puterman1990markovhernandez2012further
  • Lemma 2: Monotonicity
  • proof
  • Theorem 1
  • proof
  • Theorem 2
  • Remark 1: Analytical Challenges
  • proof
  • ...and 13 more