Table of Contents
Fetching ...

Policy Optimization in Control: Geometry and Algorithmic Implications

Shahriar Talebi, Yang Zheng, Spencer Kraisler, Na Li, Mehran Mesbahi

TL;DR

This work presents a geometric view of policy optimization for feedback control, tying together policy parameterization, stabilizability constraints, and performance objectives across LQR, LQG, and H$_\\infty$ problems. It develops a Riemannian framework on stabilizing policy sets, incorporating Lyapunov and quotient-geometric constructs to analyze gradients, Hessians, and retractions under both unconstrained and constrained policies. The paper highlights key phenomena, including nonconvex landscapes, spurious stationary points in LQG, and invariances under similarity transformations, and shows how quotient geometry and symmetry-aware metrics mitigate these issues for algorithmic convergence. It then translates these geometric insights into algorithmic implications, including convergent policy-gradient and quasi-Newton methods, data-driven oracle-based approaches, and links to optimal estimation via a duality with Kalman filtering, offering guidance for model-based and model-free control design with robustness considerations.

Abstract

This survey explores the geometric perspective on policy optimization within the realm of feedback control systems, emphasizing the intrinsic relationship between control design and optimization. By adopting a geometric viewpoint, we aim to provide a nuanced understanding of how various ``complete parameterization'' -- referring to the policy parameters together with its Riemannian geometry -- of control design problems, influence stability and performance of local search algorithms. The paper is structured to address key themes such as policy parameterization, the topology and geometry of stabilizing policies, and their implications for various (non-convex) dynamic performance measures. We focus on a few iconic control design problems, including the Linear Quadratic Regulator (LQR), Linear Quadratic Gaussian (LQG) control, and $\mathcal{H}_\infty$ control. In particular, we first discuss the topology and Riemannian geometry of stabilizing policies, distinguishing between their static and dynamic realizations. Expanding on this geometric perspective, we then explore structural properties of the aforementioned performance measures and their interplay with the geometry of stabilizing policies in presence of policy constraints; along the way, we address issues such as spurious stationary points, symmetries of dynamic feedback policies, and (non-)smoothness of the corresponding performance measures. We conclude the survey with algorithmic implications of policy optimization in feedback design.

Policy Optimization in Control: Geometry and Algorithmic Implications

TL;DR

This work presents a geometric view of policy optimization for feedback control, tying together policy parameterization, stabilizability constraints, and performance objectives across LQR, LQG, and H problems. It develops a Riemannian framework on stabilizing policy sets, incorporating Lyapunov and quotient-geometric constructs to analyze gradients, Hessians, and retractions under both unconstrained and constrained policies. The paper highlights key phenomena, including nonconvex landscapes, spurious stationary points in LQG, and invariances under similarity transformations, and shows how quotient geometry and symmetry-aware metrics mitigate these issues for algorithmic convergence. It then translates these geometric insights into algorithmic implications, including convergent policy-gradient and quasi-Newton methods, data-driven oracle-based approaches, and links to optimal estimation via a duality with Kalman filtering, offering guidance for model-based and model-free control design with robustness considerations.

Abstract

This survey explores the geometric perspective on policy optimization within the realm of feedback control systems, emphasizing the intrinsic relationship between control design and optimization. By adopting a geometric viewpoint, we aim to provide a nuanced understanding of how various ``complete parameterization'' -- referring to the policy parameters together with its Riemannian geometry -- of control design problems, influence stability and performance of local search algorithms. The paper is structured to address key themes such as policy parameterization, the topology and geometry of stabilizing policies, and their implications for various (non-convex) dynamic performance measures. We focus on a few iconic control design problems, including the Linear Quadratic Regulator (LQR), Linear Quadratic Gaussian (LQG) control, and control. In particular, we first discuss the topology and Riemannian geometry of stabilizing policies, distinguishing between their static and dynamic realizations. Expanding on this geometric perspective, we then explore structural properties of the aforementioned performance measures and their interplay with the geometry of stabilizing policies in presence of policy constraints; along the way, we address issues such as spurious stationary points, symmetries of dynamic feedback policies, and (non-)smoothness of the corresponding performance measures. We conclude the survey with algorithmic implications of policy optimization in feedback design.
Paper Structure (29 sections, 8 theorems, 62 equations, 10 figures)

This paper contains 29 sections, 8 theorems, 62 equations, 10 figures.

Key Result

Lemma 3.1

[lemma]lem:dlyap The differential of ${\,\mathbb{L}}$ at $(A,Q) \in \mathcal{A} \times \mathbb{R}^{n\times n}$ along $(E,F) \in T_{(A,Q)} (\mathcal{A} \times \mathbb{R}^{n\times n}) \equiv \mathbb{R}^{n \times n} \times \mathbb{R}^{n\times n}$ is For any $A \in \mathcal{A}$ and $Q, \Sigma \in \mathbb{R}^{n\times n}$ we further have the so-called Lyapunov-trace property,

Figures (10)

  • Figure 1: The non-convex set of stabilizing static state-feedback polices $\mathcal{S}$ for $A = 010001000$ and $B = 001$.
  • Figure 2: The 2-dimensional set of stabilizing policies subject to the off-diagonal sparsity constraint. The LTI system is $A = 0.8100.8$ and $B = 0110$.
  • Figure 3: Illustration of the set of dynamic stabilizing policies $\mathcal{C}_1$ for an LTI system with $B = C = 1$ and: (a) with $A=1.1$ resulting in two path-connected components; (b) with $A=0.9$ resulting in a single path-connected component.
  • Figure 4: The region of stabilizing dynamic feedback policies $\mathcal{C}_1 \subset \mathbb{R}^3$ for the plant $(A,B,C)=(1.1,1,1)$. Each colored curve (red, purple, yellow, magenta, dark blue) is an individual orbit of policies. Note that $\mathcal{C}_1$ has 2 path-connected components.
  • Figure 5: Local retraction defined by the stability certificate: A schematics of the gray plane exemplifying a tangent space at a point $K$ on the blue manifold. The stability certificate provides a (purple) neighborhood of the origin (in every tangent space) such that an efficient "local retraction" $\overline{\mathcal{R}}_K$can be obtained, such that every tangent vector $V_K$ can be "retracted" to $\overline{\mathcal{R}}_K[\eta V_K]$ after proper scaling by the stability certificate $s_K$ with $\eta := s_K(V_K)$.
  • ...and 5 more figures

Theorems & Definitions (9)

  • Lemma 3.1
  • Theorem 3.2
  • Theorem 3.3
  • Lemma 4.1
  • Remark 1
  • Proposition 4.2
  • Lemma 4.3
  • Theorem 4.4
  • Lemma 4.5