Output-feedback Synthesis Orbit Geometry: Quotient Manifolds and LQG Direct Policy Optimization

Spencer Kraisler; Mehran Mesbahi

Output-feedback Synthesis Orbit Geometry: Quotient Manifolds and LQG Direct Policy Optimization

Spencer Kraisler, Mehran Mesbahi

TL;DR

This work addresses direct policy optimization for the linear-quadratic Gaussian problem by optimizing over dynamic output-feedback controllers, where the optimization landscape features orbit-based degeneracies that impede gradient methods. The authors introduce a coordinate-invariant Riemannian metric on the space of full-order minimal dynamic output-feedback controllers and formulate a Riemannian gradient descent (RGD) algorithm on the quotient manifold obtained by modding out coordinate transformations. They prove that the orbit space forms a smooth Riemannian quotient manifold and establish a local linear convergence guarantee for RGD, resulting in faster and more robust convergence than standard gradient descent in numerical tests. The approach provides a principled geometric framework for direct LQG policy optimization with potential extensions to constrained PO and alternative invariant metrics.

Abstract

We consider direct policy optimization for the linear-quadratic Gaussian (LQG) setting. Over the past few years, it has been recognized that the landscape of dynamic output-feedback controllers of relevance to LQG has an intricate geometry, particularly pertaining to the existence of degenerate stationary points, that hinders gradient methods. In order to address these challenges, in this paper, we adopt a system-theoretic coordinate-invariant Riemannian metric for the space of dynamic output-feedback controllers and develop a Riemannian gradient descent for direct LQG policy optimization. We then proceed to prove that the orbit space of such controllers, modulo the coordinate transformation, admits a Riemannian quotient manifold structure. This geometric structure--that is of independent interest--provides an effective approach to derive direct policy optimization algorithms for LQG with a local linear rate convergence guarantee. Subsequently, we show that the proposed approach exhibits significantly faster and more robust numerical performance as compared with ordinary gradient descent.

Output-feedback Synthesis Orbit Geometry: Quotient Manifolds and LQG Direct Policy Optimization

TL;DR

Abstract

Paper Structure (12 sections, 11 theorems, 29 equations, 3 figures, 2 algorithms)

This paper contains 12 sections, 11 theorems, 29 equations, 3 figures, 2 algorithms.

Introduction
Preliminaries and Notation
Geometry of Riemannian gradient descent
Direct PO for LQG
Krishnaprasad-Martin Metric
Coordinate-invariance of the KM Metric
Orbit Space of Output-feedback Controllers
Convergence Analysis
Limitation of gradient descent on LQG landscape
Numerical Experiments and Results
Conclusion and Future Directions
Acknowledgements

Key Result

Lemma II.1

The subset $\widetilde{\mathcal{C}}^{\min}_q \subset \widetilde{\mathcal{C}}_q$ is an open, dense subset with a measure zero complement.

Figures (3)

Figure 1: Visualization of RGD; here, $x_2 = \mathcal{R}_{x_1}(-s_1\nabla f(x_1))$.
Figure 2: Illustration of a manifold and its orbit space.
Figure 3: Comparison of RGD vs. GD for LQG PO for four distinct systems.

Theorems & Definitions (19)

Lemma II.1
proof
Lemma III.1
proof
Theorem III.2
proof
Lemma IV.1
Lemma IV.2
Theorem V.2
Lemma V.3
...and 9 more

Output-feedback Synthesis Orbit Geometry: Quotient Manifolds and LQG Direct Policy Optimization

TL;DR

Abstract

Output-feedback Synthesis Orbit Geometry: Quotient Manifolds and LQG Direct Policy Optimization

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (19)