Table of Contents
Fetching ...

Interacting Particle Systems for Fast Linear Quadratic RL

Anant A Joshi, Heng-Sheng Chang, Amirhossein Taghvaei, Prashant G Mehta, Sean P. Meyn

TL;DR

This work develops a simulator-based, interacting-particle framework for fast learning of optimal linear-quadratic controllers in continuous time. By coupling particle trajectories through a mean-field interaction, it constructs a dual EnKF that tracks the dual Riccati solution and yields provable finite-$N$ error bounds with $1/N$ scaling, while enabling online computation of gains without requiring an initial stabilizing policy. The main contributions include extending EnKF-based analysis to stochastic/robust settings, establishing sample-complexity comparisons with state-of-the-art RL/LQG methods, and demonstrating substantial speedups in numerical experiments relative to policy-gradient and path-integral approaches. The results have practical implications for efficient, high-dimensional RL in control and robotics, where high-fidelity simulators can be leveraged to rapidly learn near-optimal linear controllers. Overall, the paper provides a principled, scalable, and simulator-friendly route to fast RL in LQ settings through interacting particle systems and mean-field couplings.

Abstract

This paper is concerned with the design of algorithms based on systems of interacting particles to represent, approximate, and learn the optimal control law for reinforcement learning (RL). The primary contribution is that convergence rates are greatly accelerated by the interactions between particles. Theory focuses on the linear quadratic stochastic optimal control problem for which a complete and novel theory is presented. Apart from the new algorithm, sample complexity bounds are obtained, and it is shown that the mean square error scales as $1/N$ where $N$ is the number of particles. The theoretical results and algorithms are illustrated with numerical experiments and comparisons with other recent approaches, where the faster convergence of the proposed algorithm is numerically demonstrated.

Interacting Particle Systems for Fast Linear Quadratic RL

TL;DR

This work develops a simulator-based, interacting-particle framework for fast learning of optimal linear-quadratic controllers in continuous time. By coupling particle trajectories through a mean-field interaction, it constructs a dual EnKF that tracks the dual Riccati solution and yields provable finite- error bounds with scaling, while enabling online computation of gains without requiring an initial stabilizing policy. The main contributions include extending EnKF-based analysis to stochastic/robust settings, establishing sample-complexity comparisons with state-of-the-art RL/LQG methods, and demonstrating substantial speedups in numerical experiments relative to policy-gradient and path-integral approaches. The results have practical implications for efficient, high-dimensional RL in control and robotics, where high-fidelity simulators can be leveraged to rapidly learn near-optimal linear controllers. Overall, the paper provides a principled, scalable, and simulator-friendly route to fast RL in LQ settings through interacting particle systems and mean-field couplings.

Abstract

This paper is concerned with the design of algorithms based on systems of interacting particles to represent, approximate, and learn the optimal control law for reinforcement learning (RL). The primary contribution is that convergence rates are greatly accelerated by the interactions between particles. Theory focuses on the linear quadratic stochastic optimal control problem for which a complete and novel theory is presented. Apart from the new algorithm, sample complexity bounds are obtained, and it is shown that the mean square error scales as where is the number of particles. The theoretical results and algorithms are illustrated with numerical experiments and comparisons with other recent approaches, where the faster convergence of the proposed algorithm is numerically demonstrated.
Paper Structure (36 sections, 4 theorems, 77 equations, 6 figures, 3 tables, 2 algorithms)

This paper contains 36 sections, 4 theorems, 77 equations, 6 figures, 3 tables, 2 algorithms.

Key Result

theorem 1

Consider the dual EnKF eq:dual_enkf_intro under Assumption assn:model. Then for $N \ge d+1$, for each fixed $t$, (where $C_1,C_2,C_3,C_4$ are model dependent but time-independent constants). For the average cost problem, there exists a constant $\lambda>0$ such that exponential convergence to the stationary solution is obtained as follows:

Figures (6)

  • Figure 1: Comparison of the numerical solutions obtained from the EnKF, the DRE, and the ARE. The plots are in order: (a) LQG, (b) LEQG ($\theta > 0$) (c) LEQG ($\theta < 0$).
  • Figure 2: Relative error in approximating the solution of the ARE by the dual EnKF.
  • Figure 3: Comparison of dual EnKF with : (a) [K19] for infinite horizon LQG; and (b) [Z21] for finite horizon LEQG. See Section \ref{['sec:comp']} for details.
  • Figure 4: Comparison of dual EnKF with path integral control for spring mass damper system.
  • Figure 5: Performance of all three controllers on stable spring mass damper system.
  • ...and 1 more figures

Theorems & Definitions (14)

  • definition 1: Simulator
  • remark 1: Simulations and RL
  • definition 2: Q-function
  • theorem 1
  • proof
  • definition 3: Empirical Q-function
  • remark 2
  • proposition 1
  • proof
  • remark 3
  • ...and 4 more