Table of Contents
Fetching ...

Deep Reinforcement Learning for Wireless Scheduling in Distributed Networked Control

Gaoyang Pang, Kang Huang, Daniel E. Quevedo, Branka Vucetic, Yonghui Li, Wanchun Liu

TL;DR

To tackle the challenges of a large action space in DRL, this work proposes novel action space reduction and action embedding methods for the DRL framework that can be applied to various algorithms, including deep Q-network (DQN), deep deterministic policy gradient (DDPG), and twin delayed DDPG (TD3).

Abstract

We consider a joint uplink and downlink scheduling problem of a fully distributed wireless networked control system (WNCS) with a limited number of frequency channels. Using elements of stochastic systems theory, we derive a sufficient stability condition of the WNCS, which is stated in terms of both the control and communication system parameters. Once the condition is satisfied, there exists a stationary and deterministic scheduling policy that can stabilize all plants of the WNCS. By analyzing and representing the per-step cost function of the WNCS in terms of a finite-length countable vector state, we formulate the optimal transmission scheduling problem into a Markov decision process and develop a deep reinforcement learning (DRL) based framework for solving it. To tackle the challenges of a large action space in DRL, we propose novel action space reduction and action embedding methods for the DRL framework that can be applied to various algorithms, including Deep Q-Network (DQN), Deep Deterministic Policy Gradient (DDPG), and Twin Delayed Deep Deterministic Policy Gradient (TD3). Numerical results show that the proposed algorithm significantly outperforms benchmark policies.

Deep Reinforcement Learning for Wireless Scheduling in Distributed Networked Control

TL;DR

To tackle the challenges of a large action space in DRL, this work proposes novel action space reduction and action embedding methods for the DRL framework that can be applied to various algorithms, including deep Q-network (DQN), deep deterministic policy gradient (DDPG), and twin delayed DDPG (TD3).

Abstract

We consider a joint uplink and downlink scheduling problem of a fully distributed wireless networked control system (WNCS) with a limited number of frequency channels. Using elements of stochastic systems theory, we derive a sufficient stability condition of the WNCS, which is stated in terms of both the control and communication system parameters. Once the condition is satisfied, there exists a stationary and deterministic scheduling policy that can stabilize all plants of the WNCS. By analyzing and representing the per-step cost function of the WNCS in terms of a finite-length countable vector state, we formulate the optimal transmission scheduling problem into a Markov decision process and develop a deep reinforcement learning (DRL) based framework for solving it. To tackle the challenges of a large action space in DRL, we propose novel action space reduction and action embedding methods for the DRL framework that can be applied to various algorithms, including Deep Q-Network (DQN), Deep Deterministic Policy Gradient (DDPG), and Twin Delayed Deep Deterministic Policy Gradient (TD3). Numerical results show that the proposed algorithm significantly outperforms benchmark policies.

Paper Structure

This paper contains 21 sections, 3 theorems, 44 equations, 6 figures, 1 table, 1 algorithm.

Key Result

Theorem 1

Consider the index set $\{\mathcal{F}_m\}$ as introduced above and define $\rho^{\max}_{m} \triangleq \max_{i\in \mathcal{F}_m}\rho^2(\mathbf{A}_i)$, $\bar{\xi}^{\max}_{m} \triangleq \max_{i\in \mathcal{F}_m}\{\bar{\xi}^s_{m,i},\bar{\xi}^c_{m,i}\}$, $\bar{\xi}^s_{m,i} \triangleq 1 -{\xi}^s_{m,i}$, $ where the operation $\min$ is taken over the set of all possible partitions $(\mathcal{F}_1,\dots,\

Figures (6)

  • Figure 1: A distributed networked control system with $N$ plants sharing $M$ frequency channels. Kalman filter, remote estimator and control algorithm are denoted as KF, RE and CA and discussed in Sections \ref{['sec:KF']}, \ref{['sec:RE']} and \ref{['sec:control']}, respectively.
  • Figure 2: Illustration of the state parameters of plant $i$.
  • Figure 3: The long-term average performance of DRL and benchmark algorithms over a system with $N = 5$ and $M = 5$. Values are means $\pm$ standard error of mean. RAS denotes reduced action space.
  • Figure 4: The long-term average performance of DRL and benchmark algorithms over a system with $N = 10$ and $M = 10$. Values are means $\pm$ standard error of mean.
  • Figure 5: The long-term average performance of DRL and benchmark algorithms over a system with $N = 8$ and $M = 6$. Values are means $\pm$ standard error of mean.
  • ...and 1 more figures

Theorems & Definitions (11)

  • Remark 1
  • Definition 1: Mean-Square Stability
  • Theorem 1: Stabilizability
  • proof
  • Example 1
  • Remark 2
  • Proposition 1
  • Proposition 2
  • Remark 3
  • Remark 4
  • ...and 1 more