A General Control-Theoretic Approach for Reinforcement Learning: Theory and Algorithms
Weiqin Chen, Mark S. Squillante, Chai Wah Wu, Santiago Paternain
TL;DR
This work addresses the challenge of sample-inefficient reinforcement learning by introducing control-based reinforcement learning (CBRL), which directly learns the unknown variables of an underlying control problem to derive the optimal policy. It builds a general theory around a contraction-based CBRL operator and a Q-learning analogue, augmented by a control-policy-variable gradient ascent theorem that ties policy performance to the learned variables, with the linear-quadratic regulator (LQR) as a representative instantiation. The authors prove contraction and convergence properties, establish asymptotic optimality under approximate policy families, and derive a gradient method for updating the learned variables. Empirically, CBRL with LQR (and piecewise-LQR for nonlinear tasks) achieves superior performance, reduced sample complexity, and faster runtimes across Cart Pole, Lunar Lander, Mountain Car, and Pendulum compared with strong baselines, demonstrating practical impact for efficient, robust control-aware RL.
Abstract
We devise a control-theoretic reinforcement learning approach to support direct learning of the optimal policy. We establish various theoretical properties of our approach, such as convergence and optimality of our analog of the Bellman operator and Q-learning, a new control-policy-variable gradient theorem, and a specific gradient ascent algorithm based on this theorem within the context of a specific control-theoretic framework. We empirically evaluate the performance of our control theoretic approach on several classical reinforcement learning tasks, demonstrating significant improvements in solution quality, sample complexity, and running time of our approach over state-of-the-art methods.
