Policy Optimization in Robust Control: Weak Convexity and Subgradient Methods
Yuto Watanabe, Feng-Yi Liao, Yang Zheng
TL;DR
This work analyzes discrete-time ${\rm H}_\infty$ policy optimization with static output-feedback, proving that the cost $J(K)$ is locally/regionally weakly convex on sublevel sets and, in the full-state case, satisfies a weak Polyak-Łojasiewicz inequality, guaranteeing global optimality of stationary points. It develops a simple subgradient method for this nonsmooth, nonconvex problem and establishes the first deterministic non-asymptotic convergence rate via Moreau envelopes, under mild boundedness assumptions. The analysis leverages a lower-$C^2$ structure and a complex-SDP representation to show uniform weak convexity on convex subsets of sublevel sets and to connect spectral functions with convex compositions. Numerical experiments validate the theory, showing feasibility and convergence for full-state feedback and demonstrating landscape complexities (e.g., saddles) in static output-feedback, with implications for safe, model-free robust control design.
Abstract
Robust control seeks stabilizing policies that perform reliably under adversarial disturbances, with $\mathcal{H}_\infty$ control as a classical formulation. It is known that policy optimization of robust $\mathcal{H}_\infty$ control naturally lead to nonsmooth and nonconvex problems. This paper builds on recent advances in nonsmooth optimization to analyze discrete-time static output-feedback $\mathcal{H}_\infty$ control. We show that the $\mathcal{H}_\infty$ cost is weakly convex over any convex subset of a sublevel set. This structural property allows us to establish the first non-asymptotic deterministic convergence rate for the subgradient method under suitable assumptions. In addition, we prove a weak Polyak-Łojasiewicz (PL) inequality in the state-feedback case, implying that all stationary points are globally optimal. We finally present a few numerical examples to validate the theoretical results.
