Table of Contents
Fetching ...

Convergence and Robustness of Value and Policy Iteration for the Linear Quadratic Regulator

Bowen Song, Chenxuan Wu, Andrea Iannelli

TL;DR

This work analyzes the convergence and robustness of discrete-time value iteration (VI) and policy iteration (PI) for the linear quadratic regulator (LQR). It extends known exponential-convergence results by establishing local exponential convergence around the optimal kernel $P^*$ and derives input-to-state stability (ISS)-style bounds to quantify robustness when system matrices $(A,B)$ are estimated with bounded errors. The results show that both VI and PI preserve stability and converge to the optimal $P^*$ under bounded perturbations, with PI offering monotonic convergence in a region around $P^*$. Numerical simulations on a three-dimensional LQR illustrate the theoretical findings and highlight implications for approximate dynamic programming and data-driven control.

Abstract

This paper revisits and extends the convergence and robustness properties of value and policy iteration algorithms for discrete-time linear quadratic regulator problems. In the model-based case, we extend current results concerning the region of exponential convergence of both algorithms. In the case where there is uncertainty on the value of the system matrices, we provide input-to-state stability results capturing the effect of model parameter uncertainties. Our findings offer new insights into these algorithms at the heart of several approximate dynamic programming schemes, highlighting their convergence and robustness behaviors. Numerical examples illustrate the significance of some of the theoretical results.

Convergence and Robustness of Value and Policy Iteration for the Linear Quadratic Regulator

TL;DR

This work analyzes the convergence and robustness of discrete-time value iteration (VI) and policy iteration (PI) for the linear quadratic regulator (LQR). It extends known exponential-convergence results by establishing local exponential convergence around the optimal kernel and derives input-to-state stability (ISS)-style bounds to quantify robustness when system matrices are estimated with bounded errors. The results show that both VI and PI preserve stability and converge to the optimal under bounded perturbations, with PI offering monotonic convergence in a region around . Numerical simulations on a three-dimensional LQR illustrate the theoretical findings and highlight implications for approximate dynamic programming and data-driven control.

Abstract

This paper revisits and extends the convergence and robustness properties of value and policy iteration algorithms for discrete-time linear quadratic regulator problems. In the model-based case, we extend current results concerning the region of exponential convergence of both algorithms. In the case where there is uncertainty on the value of the system matrices, we provide input-to-state stability results capturing the effect of model parameter uncertainties. Our findings offer new insights into these algorithms at the heart of several approximate dynamic programming schemes, highlighting their convergence and robustness behaviors. Numerical examples illustrate the significance of some of the theoretical results.

Paper Structure

This paper contains 15 sections, 10 theorems, 35 equations, 3 figures, 4 algorithms.

Key Result

Theorem 1

Properties of VI SemiCone If the system dynamics $(A,B)$ are stabilizable, then for all $P_0 \succeq 0$:

Figures (3)

  • Figure 1: 2-dimensional Graphic Representation
  • Figure 2: Convergence of VI and PI
  • Figure 3: Robustness of VI and PI

Theorems & Definitions (18)

  • Definition 1: Stability of gain $K$ and kernel $P$
  • Theorem 1
  • Theorem 2
  • Lemma 1: Stability of $P$ around $P^*$
  • Theorem 3: Local exponential convergence of VI
  • proof
  • Remark 1
  • Corollary 1: Exponential Convergence of VI
  • Theorem 4: Local exponential convergence of PI
  • Remark 2
  • ...and 8 more