Sample Complexity of Linear Quadratic Regulator Without Initial Stability
Amirreza Neshaei Moghaddam, Alex Olshevsky, Bahman Gharesifard
TL;DR
This work targets learning-based LQR with unknown dynamics by proposing a receding-horizon, model-free policy gradient algorithm that does not require a stabilizing initial policy or two-point gradient estimates. A hierarchical outer loop constructs surrogate finite-horizon costs, while an inner loop uses a one-point gradient estimator to optimize each horizon, with theoretical guarantees grounded in the contraction of the Riccati operator under the Riemannian distance. The authors establish a uniform $ ilde{O}(\\varepsilon^{-2})$ sample complexity, independent of problem-specific constants, by leveraging a refined analysis of error propagation. Empirical results on a standard LQR benchmark show favorable sample complexity and policy accuracy, often outperforming prior two-point methods, and the framework opens paths to extensions to nonlinear and partially observed systems.
Abstract
Inspired by REINFORCE, we introduce a novel receding-horizon algorithm for the Linear Quadratic Regulator (LQR) problem with unknown dynamics. Unlike prior methods, our algorithm avoids reliance on two-point gradient estimates while maintaining the same order of sample complexity. Furthermore, it eliminates the restrictive requirement of starting with a stable initial policy, broadening its applicability. Beyond these improvements, we introduce a refined analysis of error propagation through the contraction of the Riccati operator under the Riemannian distance. This refinement leads to a better sample complexity and ensures improved convergence guarantees.
