Policy Optimization in Control: Geometry and Algorithmic Implications
Shahriar Talebi, Yang Zheng, Spencer Kraisler, Na Li, Mehran Mesbahi
TL;DR
This work presents a geometric view of policy optimization for feedback control, tying together policy parameterization, stabilizability constraints, and performance objectives across LQR, LQG, and H$_\\infty$ problems. It develops a Riemannian framework on stabilizing policy sets, incorporating Lyapunov and quotient-geometric constructs to analyze gradients, Hessians, and retractions under both unconstrained and constrained policies. The paper highlights key phenomena, including nonconvex landscapes, spurious stationary points in LQG, and invariances under similarity transformations, and shows how quotient geometry and symmetry-aware metrics mitigate these issues for algorithmic convergence. It then translates these geometric insights into algorithmic implications, including convergent policy-gradient and quasi-Newton methods, data-driven oracle-based approaches, and links to optimal estimation via a duality with Kalman filtering, offering guidance for model-based and model-free control design with robustness considerations.
Abstract
This survey explores the geometric perspective on policy optimization within the realm of feedback control systems, emphasizing the intrinsic relationship between control design and optimization. By adopting a geometric viewpoint, we aim to provide a nuanced understanding of how various ``complete parameterization'' -- referring to the policy parameters together with its Riemannian geometry -- of control design problems, influence stability and performance of local search algorithms. The paper is structured to address key themes such as policy parameterization, the topology and geometry of stabilizing policies, and their implications for various (non-convex) dynamic performance measures. We focus on a few iconic control design problems, including the Linear Quadratic Regulator (LQR), Linear Quadratic Gaussian (LQG) control, and $\mathcal{H}_\infty$ control. In particular, we first discuss the topology and Riemannian geometry of stabilizing policies, distinguishing between their static and dynamic realizations. Expanding on this geometric perspective, we then explore structural properties of the aforementioned performance measures and their interplay with the geometry of stabilizing policies in presence of policy constraints; along the way, we address issues such as spurious stationary points, symmetries of dynamic feedback policies, and (non-)smoothness of the corresponding performance measures. We conclude the survey with algorithmic implications of policy optimization in feedback design.
