Interacting Particle Systems for Fast Linear Quadratic RL
Anant A Joshi, Heng-Sheng Chang, Amirhossein Taghvaei, Prashant G Mehta, Sean P. Meyn
TL;DR
This work develops a simulator-based, interacting-particle framework for fast learning of optimal linear-quadratic controllers in continuous time. By coupling particle trajectories through a mean-field interaction, it constructs a dual EnKF that tracks the dual Riccati solution and yields provable finite-$N$ error bounds with $1/N$ scaling, while enabling online computation of gains without requiring an initial stabilizing policy. The main contributions include extending EnKF-based analysis to stochastic/robust settings, establishing sample-complexity comparisons with state-of-the-art RL/LQG methods, and demonstrating substantial speedups in numerical experiments relative to policy-gradient and path-integral approaches. The results have practical implications for efficient, high-dimensional RL in control and robotics, where high-fidelity simulators can be leveraged to rapidly learn near-optimal linear controllers. Overall, the paper provides a principled, scalable, and simulator-friendly route to fast RL in LQ settings through interacting particle systems and mean-field couplings.
Abstract
This paper is concerned with the design of algorithms based on systems of interacting particles to represent, approximate, and learn the optimal control law for reinforcement learning (RL). The primary contribution is that convergence rates are greatly accelerated by the interactions between particles. Theory focuses on the linear quadratic stochastic optimal control problem for which a complete and novel theory is presented. Apart from the new algorithm, sample complexity bounds are obtained, and it is shown that the mean square error scales as $1/N$ where $N$ is the number of particles. The theoretical results and algorithms are illustrated with numerical experiments and comparisons with other recent approaches, where the faster convergence of the proposed algorithm is numerically demonstrated.
