InterQ: A DQN Framework for Optimal Intermittent Control
Shubham Aggarwal, Dipankar Maity, Tamer Başar
TL;DR
InterQ tackles joint communication-control co-design for discrete-time stochastic linear systems with a scheduler and a controller, balancing transmission cost and control performance via the objective $J=\mathbb{E}\left[\sum_{k=0}^\infty \gamma^k ( x_k^T Q x_k + u_k^T R u_k + \lambda a_k )\right]$. By exploiting a separation principle, the authors derive a linear optimal controller based on the estimator state and formulate the scheduler as a Markov decision process over the estimation error; they then introduce InterQ, a deep Q-learning framework that learns the scheduling policy by approximating the Q-function with a neural network, stabilized through experience replay and a target network. Numerical results on a 2D unstable Gauss-Markov-like system show that InterQ achieves superior Pareto-optimal trade-offs between control cost and communication cost compared with periodic and event-triggered baselines, and reveal an ellipse-like scheduling region aligned with the theoretical threshold conditions. The work provides a practical, open-source RL-based co-design tool for intermittent control, with insights on training stability, memory effects, and robustness to noise distributions.
Abstract
In this letter, we explore the communication-control co-design of discrete-time stochastic linear systems through reinforcement learning. Specifically, we examine a closed-loop system involving two sequential decision-makers: a scheduler and a controller. The scheduler continuously monitors the system's state but transmits it to the controller intermittently to balance the communication cost and control performance. The controller, in turn, determines the control input based on the intermittently received information. Given the partially nested information structure, we show that the optimal control policy follows a certainty-equivalence form. Subsequently, we analyze the qualitative behavior of the scheduling policy. To develop the optimal scheduling policy, we propose InterQ, a deep reinforcement learning algorithm which uses a deep neural network to approximate the Q-function. Through extensive numerical evaluations, we analyze the scheduling landscape and further compare our approach against two baseline strategies: (a) a multi-period periodic scheduling policy, and (b) an event-triggered policy. The results demonstrate that our proposed method outperforms both baselines. The open source implementation can be found at https://github.com/AC-sh/InterQ.
