Table of Contents
Fetching ...

Policy Optimization in Robust Control: Weak Convexity and Subgradient Methods

Yuto Watanabe, Feng-Yi Liao, Yang Zheng

TL;DR

This work analyzes discrete-time ${\rm H}_\infty$ policy optimization with static output-feedback, proving that the cost $J(K)$ is locally/regionally weakly convex on sublevel sets and, in the full-state case, satisfies a weak Polyak-Łojasiewicz inequality, guaranteeing global optimality of stationary points. It develops a simple subgradient method for this nonsmooth, nonconvex problem and establishes the first deterministic non-asymptotic convergence rate via Moreau envelopes, under mild boundedness assumptions. The analysis leverages a lower-$C^2$ structure and a complex-SDP representation to show uniform weak convexity on convex subsets of sublevel sets and to connect spectral functions with convex compositions. Numerical experiments validate the theory, showing feasibility and convergence for full-state feedback and demonstrating landscape complexities (e.g., saddles) in static output-feedback, with implications for safe, model-free robust control design.

Abstract

Robust control seeks stabilizing policies that perform reliably under adversarial disturbances, with $\mathcal{H}_\infty$ control as a classical formulation. It is known that policy optimization of robust $\mathcal{H}_\infty$ control naturally lead to nonsmooth and nonconvex problems. This paper builds on recent advances in nonsmooth optimization to analyze discrete-time static output-feedback $\mathcal{H}_\infty$ control. We show that the $\mathcal{H}_\infty$ cost is weakly convex over any convex subset of a sublevel set. This structural property allows us to establish the first non-asymptotic deterministic convergence rate for the subgradient method under suitable assumptions. In addition, we prove a weak Polyak-Łojasiewicz (PL) inequality in the state-feedback case, implying that all stationary points are globally optimal. We finally present a few numerical examples to validate the theoretical results.

Policy Optimization in Robust Control: Weak Convexity and Subgradient Methods

TL;DR

This work analyzes discrete-time policy optimization with static output-feedback, proving that the cost is locally/regionally weakly convex on sublevel sets and, in the full-state case, satisfies a weak Polyak-Łojasiewicz inequality, guaranteeing global optimality of stationary points. It develops a simple subgradient method for this nonsmooth, nonconvex problem and establishes the first deterministic non-asymptotic convergence rate via Moreau envelopes, under mild boundedness assumptions. The analysis leverages a lower- structure and a complex-SDP representation to show uniform weak convexity on convex subsets of sublevel sets and to connect spectral functions with convex compositions. Numerical experiments validate the theory, showing feasibility and convergence for full-state feedback and demonstrating landscape complexities (e.g., saddles) in static output-feedback, with implications for safe, model-free robust control design.

Abstract

Robust control seeks stabilizing policies that perform reliably under adversarial disturbances, with control as a classical formulation. It is known that policy optimization of robust control naturally lead to nonsmooth and nonconvex problems. This paper builds on recent advances in nonsmooth optimization to analyze discrete-time static output-feedback control. We show that the cost is weakly convex over any convex subset of a sublevel set. This structural property allows us to establish the first non-asymptotic deterministic convergence rate for the subgradient method under suitable assumptions. In addition, we prove a weak Polyak-Łojasiewicz (PL) inequality in the state-feedback case, implying that all stationary points are globally optimal. We finally present a few numerical examples to validate the theoretical results.

Paper Structure

This paper contains 16 sections, 9 theorems, 59 equations, 4 figures.

Key Result

Lemma 1

Suppose assumption:stablizabilityassumption:K_nonempty hold. Consider the $\mathcal{H}_\infty$ optimization problem with static linear policies in eq:policy-optimization-main. Then the following statements hold:

Figures (4)

  • Figure 1: Nonconvex and nonsmooth landscape in ${\mathcal{H}_\infty}$ optimization. (a)--(b) Nonsmoothness of the cost functions $J$ in \ref{['example:nonsmoothness']}; (c) Disconnectivity of $\mathcal{K}$ in \ref{['example:hinf_nonconvexity']} with $\alpha=0.13$; (d) Nonconvexity of $J$ in \ref{['example:hinf_nonconvexity']} with $\alpha=0.13$.
  • Figure 2: Plots of $f$ and $f+\frac{5}{4}\|\cdot-1\|^2$ on $V=[0,5]$ in \ref{['example:weak_convexity']}.
  • Figure 3: Spurious stationary points of $J$ with the set $\mathcal{K}$ in the case of static output-feedback in \ref{['example:saddle_local_min']} (the same system as \ref{['example:hinf_nonconvexity']} with $\alpha=0.14$). It can be observed that $J(K)$ possesses not only a local minimum but also a saddle point, represented by the red dot.
  • Figure 4: The simulation results of the subgradient method \ref{['eq:subGM']} with $\alpha_t=\alpha^1=10^{-3}$ or $\alpha^2=10^{-4}$: (a) The plot of $\min_t |J(K_t)-J^\star|/J^\star$ for the system \ref{['eq:example-2']} with a full state-feedback in \ref{['example:nonsmoothness']} starting from $K_0=K_0^1$ or $K_0^2$, where $K_0^1=[0, -1.9]$ and $K_0^2=[0.2,-2]$; (b) The plots of $\min_t J(K_t)$ (upper) and $\min_t \|G_t\|_F^2$ with $G_t\in\partial J(K_t)$ (lower) for \ref{['example:saddle_local_min']} with $K_0 = K_0^1=[0, 0]$ or $K_0^2=[-5,-2]$.

Theorems & Definitions (21)

  • Lemma 1
  • Example 1: Nonsmoothness of function $J$
  • Example 2: Nonconvexity of domain $\mathcal{K}$ and function $J$
  • Remark 1: $\mathcal{H}_\infty$ optimization in continuous-time systems
  • Definition 1: Lower-$C^2$ rockafellar2009variational
  • Lemma 2
  • proof
  • Theorem 1: Weak convexity
  • Example 3: Weak convexity
  • Lemma 3
  • ...and 11 more