Model-Free $μ$-Synthesis: A Nonsmooth Optimization Perspective

Darioush Keivan; Xingang Guo; Peter Seiler; Geir Dullerud; Bin Hu

Model-Free $μ$-Synthesis: A Nonsmooth Optimization Perspective

Darioush Keivan, Xingang Guo, Peter Seiler, Geir Dullerud, Bin Hu

TL;DR

This paper examines the effectiveness of two model-free policy optimization strategies: the model-free non-derivative sampling method and the zeroth-order policy search with uniform smoothing and demonstrates that both methods consistently replicate the design outcomes achieved by their model-based counterparts.

Abstract

In this paper, we revisit model-free policy search on an important robust control benchmark, namely $μ$-synthesis. In the general output-feedback setting, there do not exist convex formulations for this problem, and hence global optimality guarantees are not expected. Apkarian (2011) presented a nonconvex nonsmooth policy optimization approach for this problem, and achieved state-of-the-art design results via using subgradient-based policy search algorithms which generate update directions in a model-based manner. Despite the lack of convexity and global optimality guarantees, these subgradient-based policy search methods have led to impressive numerical results in practice. Built upon such a policy optimization persepctive, our paper extends these subgradient-based search methods to a model-free setting. Specifically, we examine the effectiveness of two model-free policy optimization strategies: the model-free non-derivative sampling method and the zeroth-order policy search with uniform smoothing. We performed an extensive numerical study to demonstrate that both methods consistently replicate the design outcomes achieved by their model-based counterparts. Additionally, we provide some theoretical justifications showing that convergence guarantees to stationary points can be established for our model-free $μ$-synthesis under some assumptions related to the coerciveness of the cost function. Overall, our results demonstrate that derivative-free policy optimization offers a competitive and viable approach for solving general output-feedback $μ$-synthesis problems in the model-free setting.

Model-Free $μ$-Synthesis: A Nonsmooth Optimization Perspective

TL;DR

Abstract

In this paper, we revisit model-free policy search on an important robust control benchmark, namely

-synthesis. In the general output-feedback setting, there do not exist convex formulations for this problem, and hence global optimality guarantees are not expected. Apkarian (2011) presented a nonconvex nonsmooth policy optimization approach for this problem, and achieved state-of-the-art design results via using subgradient-based policy search algorithms which generate update directions in a model-based manner. Despite the lack of convexity and global optimality guarantees, these subgradient-based policy search methods have led to impressive numerical results in practice. Built upon such a policy optimization persepctive, our paper extends these subgradient-based search methods to a model-free setting. Specifically, we examine the effectiveness of two model-free policy optimization strategies: the model-free non-derivative sampling method and the zeroth-order policy search with uniform smoothing. We performed an extensive numerical study to demonstrate that both methods consistently replicate the design outcomes achieved by their model-based counterparts. Additionally, we provide some theoretical justifications showing that convergence guarantees to stationary points can be established for our model-free

-synthesis under some assumptions related to the coerciveness of the cost function. Overall, our results demonstrate that derivative-free policy optimization offers a competitive and viable approach for solving general output-feedback

-synthesis problems in the model-free setting.

Paper Structure (17 sections, 2 theorems, 25 equations, 3 figures, 1 table, 2 algorithms)

This paper contains 17 sections, 2 theorems, 25 equations, 3 figures, 1 table, 2 algorithms.

Introduction
Problem Formulations and Preliminaries
Setup of $\mu$-synthesis.
Policy optimization formulation.
Problem statement: model-free policy search.
Review: Subgradient methods in the model-based setting.
Main Results: Model-Free Algorithms and Theoretical Justifications
Non-derivative Sampling
Derivative-free Optimization with Randomized Smoothing
Theoretical Justifications
Numerical Experiments
Doyle's Example
Higher Dimension Examples
Conclusions
Completing the proof for Lemma \ref{['lem1']}:
...and 2 more sections

Key Result

lemma 1

Suppose $B := \left[ \right]$ and $C := \left[ \right]^\top$ are full rank matrices. Then the objective function $J_c(K_c)$ defined by eq:opt_new is coercive over the set $\mathcal{K}_c$ in the sense that for any sequence $\{K_c^l\}_{l=1}^\infty \subset \mathcal{K}_c$ we have if either $\|K_c^l\|_F \rightarrow +\infty$, or $K_c^l$ converges to an element in the boundary $\partial \mathcal{K}_c$

Figures (3)

Figure 1: Interconnection for Robust Synthesis
Figure 2: Algorithm \ref{['alg:DF_PO']} iterates in policy space for Doyle's example.
Figure 3: Left: The plot illustrates the normalized deviation of trajectories from Algorithm \ref{['alo:NS']} relative to the MATLAB musyn function outputs, denoted as $\mu_{DK}$, across system states $n_x = \{10, 20, 30\}$. Right: The plot illustrates the normalized deviation of trajectories from Algorithm \ref{['alg:DF_PO']} relative to the MATLAB musyn function outputs, denoted as $\mu_{DK}$, across system states $n_x = \{10, 20, 30\}$. Solid lines depict the mean values, and the shaded regions represent the 98% confidence intervals.

Theorems & Definitions (5)

definition 1
lemma 1
proof
lemma 2
proof

Model-Free $μ$-Synthesis: A Nonsmooth Optimization Perspective

TL;DR

Abstract

Model-Free $μ$-Synthesis: A Nonsmooth Optimization Perspective

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (5)