Table of Contents
Fetching ...

InterQ: A DQN Framework for Optimal Intermittent Control

Shubham Aggarwal, Dipankar Maity, Tamer Başar

TL;DR

InterQ tackles joint communication-control co-design for discrete-time stochastic linear systems with a scheduler and a controller, balancing transmission cost and control performance via the objective $J=\mathbb{E}\left[\sum_{k=0}^\infty \gamma^k ( x_k^T Q x_k + u_k^T R u_k + \lambda a_k )\right]$. By exploiting a separation principle, the authors derive a linear optimal controller based on the estimator state and formulate the scheduler as a Markov decision process over the estimation error; they then introduce InterQ, a deep Q-learning framework that learns the scheduling policy by approximating the Q-function with a neural network, stabilized through experience replay and a target network. Numerical results on a 2D unstable Gauss-Markov-like system show that InterQ achieves superior Pareto-optimal trade-offs between control cost and communication cost compared with periodic and event-triggered baselines, and reveal an ellipse-like scheduling region aligned with the theoretical threshold conditions. The work provides a practical, open-source RL-based co-design tool for intermittent control, with insights on training stability, memory effects, and robustness to noise distributions.

Abstract

In this letter, we explore the communication-control co-design of discrete-time stochastic linear systems through reinforcement learning. Specifically, we examine a closed-loop system involving two sequential decision-makers: a scheduler and a controller. The scheduler continuously monitors the system's state but transmits it to the controller intermittently to balance the communication cost and control performance. The controller, in turn, determines the control input based on the intermittently received information. Given the partially nested information structure, we show that the optimal control policy follows a certainty-equivalence form. Subsequently, we analyze the qualitative behavior of the scheduling policy. To develop the optimal scheduling policy, we propose InterQ, a deep reinforcement learning algorithm which uses a deep neural network to approximate the Q-function. Through extensive numerical evaluations, we analyze the scheduling landscape and further compare our approach against two baseline strategies: (a) a multi-period periodic scheduling policy, and (b) an event-triggered policy. The results demonstrate that our proposed method outperforms both baselines. The open source implementation can be found at https://github.com/AC-sh/InterQ.

InterQ: A DQN Framework for Optimal Intermittent Control

TL;DR

InterQ tackles joint communication-control co-design for discrete-time stochastic linear systems with a scheduler and a controller, balancing transmission cost and control performance via the objective . By exploiting a separation principle, the authors derive a linear optimal controller based on the estimator state and formulate the scheduler as a Markov decision process over the estimation error; they then introduce InterQ, a deep Q-learning framework that learns the scheduling policy by approximating the Q-function with a neural network, stabilized through experience replay and a target network. Numerical results on a 2D unstable Gauss-Markov-like system show that InterQ achieves superior Pareto-optimal trade-offs between control cost and communication cost compared with periodic and event-triggered baselines, and reveal an ellipse-like scheduling region aligned with the theoretical threshold conditions. The work provides a practical, open-source RL-based co-design tool for intermittent control, with insights on training stability, memory effects, and robustness to noise distributions.

Abstract

In this letter, we explore the communication-control co-design of discrete-time stochastic linear systems through reinforcement learning. Specifically, we examine a closed-loop system involving two sequential decision-makers: a scheduler and a controller. The scheduler continuously monitors the system's state but transmits it to the controller intermittently to balance the communication cost and control performance. The controller, in turn, determines the control input based on the intermittently received information. Given the partially nested information structure, we show that the optimal control policy follows a certainty-equivalence form. Subsequently, we analyze the qualitative behavior of the scheduling policy. To develop the optimal scheduling policy, we propose InterQ, a deep reinforcement learning algorithm which uses a deep neural network to approximate the Q-function. Through extensive numerical evaluations, we analyze the scheduling landscape and further compare our approach against two baseline strategies: (a) a multi-period periodic scheduling policy, and (b) an event-triggered policy. The results demonstrate that our proposed method outperforms both baselines. The open source implementation can be found at https://github.com/AC-sh/InterQ.

Paper Structure

This paper contains 8 sections, 2 theorems, 28 equations, 4 figures, 1 algorithm.

Key Result

Lemma 4.1

A sufficient condition for not scheduling a communication at an error state $e$ is and a sufficient condition for scheduling a communication is

Figures (4)

  • Figure 1: Schematic representation of the InterQ Algorithm.
  • Figure 2: Figure shows the communication-control trade-off curves (on the top) with its zoomed version (on the bottom); The outermost and innermost ellipses (in red) are plotted using \ref{['eq:toschedule']} and \ref{['eq:noschedule']}, respectively; the blue ellipse approximates the scheduling landscape generated by InterQ.
  • Figure 3: Communication-control trade-off curves with $\lambda=50$ (on the left) and $\lambda=60$ (on the right).
  • Figure 4: Communication-control trade-off curves with $\lambda=60$ and system noise distributed as a uniform random variable.

Theorems & Definitions (4)

  • Lemma 4.1
  • proof
  • Corollary 4.1
  • proof