Table of Contents
Fetching ...

On the (almost) Global Exponential Convergence of the Overparameterized Policy Optimization for the LQR Problem

Moh Kamalul Wafi, Arthur Castello B. de Oliveira, Eduardo D. Sontag

TL;DR

The paper studies how gradient-flow convergence in nonconvex policy optimization for the LQR depends on problem formulation, showing that a simple reparameterization can upgrade satPŁI to a global PŁI and yield GECS. For a simplified overparameterized LQR, it derives an explicit rate μ_γ(c,γ) that depends on an imbalance measure c and a region parameter γ, proving almost global exponential convergence and highlighting how larger imbalance speeds up convergence. Numerical experiments on both stable and unstable LTI systems corroborate GECS for the overparameterized approach and reveal slower convergence near saddle manifolds (c ≈ 0) in practice. The work demonstrates that thoughtful formulation and overparameterization can qualitatively and quantitatively improve gradient-flow convergence in nonconvex control problems, with implications for designing learning-based control algorithms.

Abstract

In this work we study the convergence of gradient methods for nonconvex optimization problems -- specifically the effect of the problem formulation to the convergence behavior of the solution of a gradient flow. We show through a simple example that, surprisingly, the gradient flow solution can be exponentially or asymptotically convergent, depending on how the problem is formulated. We then deepen the analysis and show that a policy optimization strategy for the continuous-time linear quadratic regulator (LQR) (which is known to present only asymptotic convergence globally) presents almost global exponential convergence if the problem is overparameterized through a linear feed-forward neural network (LFFNN). We prove this qualitative improvement always happens for a simplified version of the LQR problem and derive explicit convergence rates for the gradient flow. Finally, we show that both the qualitative improvement and the quantitative rate gains persist in the general LQR through numerical simulations.

On the (almost) Global Exponential Convergence of the Overparameterized Policy Optimization for the LQR Problem

TL;DR

The paper studies how gradient-flow convergence in nonconvex policy optimization for the LQR depends on problem formulation, showing that a simple reparameterization can upgrade satPŁI to a global PŁI and yield GECS. For a simplified overparameterized LQR, it derives an explicit rate μ_γ(c,γ) that depends on an imbalance measure c and a region parameter γ, proving almost global exponential convergence and highlighting how larger imbalance speeds up convergence. Numerical experiments on both stable and unstable LTI systems corroborate GECS for the overparameterized approach and reveal slower convergence near saddle manifolds (c ≈ 0) in practice. The work demonstrates that thoughtful formulation and overparameterization can qualitatively and quantitatively improve gradient-flow convergence in nonconvex control problems, with implications for designing learning-based control algorithms.

Abstract

In this work we study the convergence of gradient methods for nonconvex optimization problems -- specifically the effect of the problem formulation to the convergence behavior of the solution of a gradient flow. We show through a simple example that, surprisingly, the gradient flow solution can be exponentially or asymptotically convergent, depending on how the problem is formulated. We then deepen the analysis and show that a policy optimization strategy for the continuous-time linear quadratic regulator (LQR) (which is known to present only asymptotic convergence globally) presents almost global exponential convergence if the problem is overparameterized through a linear feed-forward neural network (LFFNN). We prove this qualitative improvement always happens for a simplified version of the LQR problem and derive explicit convergence rates for the gradient flow. Finally, we show that both the qualitative improvement and the quantitative rate gains persist in the general LQR through numerical simulations.

Paper Structure

This paper contains 8 sections, 7 theorems, 38 equations, 3 figures.

Key Result

Lemma 1

The gradient flow eq:GF of a minimization problem eq:OFGF:opt_prob is GECS if and only if its loss function satisfies a gPŁI .

Figures (3)

  • Figure 1: Depiction of two gradient flow solutions, one whose cost satisfies a gPŁI , and one whose cost satisfies only a satPŁI and with a globally bounded gradient.
  • Figure 2: Optimality–gap comparison for standard LQR vs. Ovp. LQR on two systems, $\mathcal{G}_1$ and $\mathcal{G}_2$. Both methods start with the same initial cost $\mathbf{K}^-(0) \!=\! K^-(0)$ and $\mathbf{K}^+(0) \!=\! K^+(0)$ with initial states $x_1(0)$ and $x_2(0)$ in turn.
  • Figure 3: The optimality–gap in $\mathcal{G}_1$ using $\mathbf{K}^-(0) = K^-(0) \approx 0_{m,n}$

Theorems & Definitions (13)

  • Definition 1: global exponential cost stability (GECS)
  • Definition 2: global Polyak--Łojasiewicz (gPŁI )
  • Lemma 1: Lemma 2 of ArthurCDC25
  • Definition 3: Saturated PŁI (satPŁI )
  • Definition 4: global linear–exponential cost stable (GLECS)
  • Lemma 2: Lemma 3 of ArthurCDC25
  • Theorem 1
  • Definition 5: Imbalance Measure
  • Proposition 1
  • Theorem 2
  • ...and 3 more