Table of Contents
Fetching ...

Sample Complexity of Linear Quadratic Regulator Without Initial Stability

Amirreza Neshaei Moghaddam, Alex Olshevsky, Bahman Gharesifard

TL;DR

This work targets learning-based LQR with unknown dynamics by proposing a receding-horizon, model-free policy gradient algorithm that does not require a stabilizing initial policy or two-point gradient estimates. A hierarchical outer loop constructs surrogate finite-horizon costs, while an inner loop uses a one-point gradient estimator to optimize each horizon, with theoretical guarantees grounded in the contraction of the Riccati operator under the Riemannian distance. The authors establish a uniform $ ilde{O}(\\varepsilon^{-2})$ sample complexity, independent of problem-specific constants, by leveraging a refined analysis of error propagation. Empirical results on a standard LQR benchmark show favorable sample complexity and policy accuracy, often outperforming prior two-point methods, and the framework opens paths to extensions to nonlinear and partially observed systems.

Abstract

Inspired by REINFORCE, we introduce a novel receding-horizon algorithm for the Linear Quadratic Regulator (LQR) problem with unknown dynamics. Unlike prior methods, our algorithm avoids reliance on two-point gradient estimates while maintaining the same order of sample complexity. Furthermore, it eliminates the restrictive requirement of starting with a stable initial policy, broadening its applicability. Beyond these improvements, we introduce a refined analysis of error propagation through the contraction of the Riccati operator under the Riemannian distance. This refinement leads to a better sample complexity and ensures improved convergence guarantees.

Sample Complexity of Linear Quadratic Regulator Without Initial Stability

TL;DR

This work targets learning-based LQR with unknown dynamics by proposing a receding-horizon, model-free policy gradient algorithm that does not require a stabilizing initial policy or two-point gradient estimates. A hierarchical outer loop constructs surrogate finite-horizon costs, while an inner loop uses a one-point gradient estimator to optimize each horizon, with theoretical guarantees grounded in the contraction of the Riccati operator under the Riemannian distance. The authors establish a uniform sample complexity, independent of problem-specific constants, by leveraging a refined analysis of error propagation. Empirical results on a standard LQR benchmark show favorable sample complexity and policy accuracy, often outperforming prior two-point methods, and the framework opens paths to extensions to nonlinear and partially observed systems.

Abstract

Inspired by REINFORCE, we introduce a novel receding-horizon algorithm for the Linear Quadratic Regulator (LQR) problem with unknown dynamics. Unlike prior methods, our algorithm avoids reliance on two-point gradient estimates while maintaining the same order of sample complexity. Furthermore, it eliminates the restrictive requirement of starting with a stable initial policy, broadening its applicability. Beyond these improvements, we introduce a refined analysis of error propagation through the contraction of the Riccati operator under the Riemannian distance. This refinement leads to a better sample complexity and ensures improved convergence guarantees.

Paper Structure

This paper contains 11 sections, 15 theorems, 173 equations, 3 figures, 1 algorithm.

Key Result

Lemma 2.1

Consider the operator $\mathcal{R}$ defined in eq: def_Riccati_operator. If Assumption ass: A_invertible holds, then for any symmetric positive definite matrices $X, Y \in \mathbb{R}^{n \times n}$, we have

Figures (3)

  • Figure 1: A brief overview of the nested-loop structure of Algorithm \ref{['alg:RHPG-RL']}.
  • Figure 2: Roadmap of technical results in Section \ref{['sec: inner loop']}.
  • Figure 3: Simulation results showing sample complexity and policy optimality gap.

Theorems & Definitions (24)

  • Lemma 2.1
  • Theorem 3.1
  • Theorem 3.2
  • Theorem 3.3
  • Remark 4.1
  • Proposition 4.1
  • proof
  • Lemma 4.1
  • proof
  • Lemma 4.2
  • ...and 14 more