Interacting Particle Systems for Fast Linear Quadratic RL

Anant A Joshi; Heng-Sheng Chang; Amirhossein Taghvaei; Prashant G Mehta; Sean P. Meyn

Interacting Particle Systems for Fast Linear Quadratic RL

Anant A Joshi, Heng-Sheng Chang, Amirhossein Taghvaei, Prashant G Mehta, Sean P. Meyn

TL;DR

This work develops a simulator-based, interacting-particle framework for fast learning of optimal linear-quadratic controllers in continuous time. By coupling particle trajectories through a mean-field interaction, it constructs a dual EnKF that tracks the dual Riccati solution and yields provable finite-$N$ error bounds with $1/N$ scaling, while enabling online computation of gains without requiring an initial stabilizing policy. The main contributions include extending EnKF-based analysis to stochastic/robust settings, establishing sample-complexity comparisons with state-of-the-art RL/LQG methods, and demonstrating substantial speedups in numerical experiments relative to policy-gradient and path-integral approaches. The results have practical implications for efficient, high-dimensional RL in control and robotics, where high-fidelity simulators can be leveraged to rapidly learn near-optimal linear controllers. Overall, the paper provides a principled, scalable, and simulator-friendly route to fast RL in LQ settings through interacting particle systems and mean-field couplings.

Abstract

This paper is concerned with the design of algorithms based on systems of interacting particles to represent, approximate, and learn the optimal control law for reinforcement learning (RL). The primary contribution is that convergence rates are greatly accelerated by the interactions between particles. Theory focuses on the linear quadratic stochastic optimal control problem for which a complete and novel theory is presented. Apart from the new algorithm, sample complexity bounds are obtained, and it is shown that the mean square error scales as $1/N$ where $N$ is the number of particles. The theoretical results and algorithms are illustrated with numerical experiments and comparisons with other recent approaches, where the faster convergence of the proposed algorithm is numerically demonstrated.

Interacting Particle Systems for Fast Linear Quadratic RL

TL;DR

error bounds with

scaling, while enabling online computation of gains without requiring an initial stabilizing policy. The main contributions include extending EnKF-based analysis to stochastic/robust settings, establishing sample-complexity comparisons with state-of-the-art RL/LQG methods, and demonstrating substantial speedups in numerical experiments relative to policy-gradient and path-integral approaches. The results have practical implications for efficient, high-dimensional RL in control and robotics, where high-fidelity simulators can be leveraged to rapidly learn near-optimal linear controllers. Overall, the paper provides a principled, scalable, and simulator-friendly route to fast RL in LQ settings through interacting particle systems and mean-field couplings.

Abstract

where

is the number of particles. The theoretical results and algorithms are illustrated with numerical experiments and comparisons with other recent approaches, where the faster convergence of the proposed algorithm is numerically demonstrated.

Paper Structure (36 sections, 4 theorems, 77 equations, 6 figures, 3 tables, 2 algorithms)

This paper contains 36 sections, 4 theorems, 77 equations, 6 figures, 3 tables, 2 algorithms.

Introduction
Algorithm proposed in this paper.
Contributions:
Problem formulation
Riccati equation and the Q function
Interacting particle algorithm
Dual EnKF for approximating solution of DRE
Algorithm for approximating optimal control
Comparison of sample complexity to related works
Conceptual comparison to path integral control
Numerical experiments and comparisons
Numerical illustration of error formulas \ref{['eq:error-S']},\ref{['eq:error-Sinf']})
Numerical comparisons with prior work
Theory
Simulator
...and 21 more sections

Key Result

theorem 1

Consider the dual EnKF eq:dual_enkf_intro under Assumption assn:model. Then for $N \ge d+1$, for each fixed $t$, (where $C_1,C_2,C_3,C_4$ are model dependent but time-independent constants). For the average cost problem, there exists a constant $\lambda>0$ such that exponential convergence to the stationary solution is obtained as follows:

Figures (6)

Figure 1: Comparison of the numerical solutions obtained from the EnKF, the DRE, and the ARE. The plots are in order: (a) LQG, (b) LEQG ($\theta > 0$) (c) LEQG ($\theta < 0$).
Figure 2: Relative error in approximating the solution of the ARE by the dual EnKF.
Figure 3: Comparison of dual EnKF with : (a) [K19] for infinite horizon LQG; and (b) [Z21] for finite horizon LEQG. See Section \ref{['sec:comp']} for details.
Figure 4: Comparison of dual EnKF with path integral control for spring mass damper system.
Figure 5: Performance of all three controllers on stable spring mass damper system.
...and 1 more figures

Theorems & Definitions (14)

definition 1: Simulator
remark 1: Simulations and RL
definition 2: Q-function
theorem 1
proof
definition 3: Empirical Q-function
remark 2
proposition 1
proof
remark 3
...and 4 more

Interacting Particle Systems for Fast Linear Quadratic RL

TL;DR

Abstract

Interacting Particle Systems for Fast Linear Quadratic RL

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (14)