Table of Contents
Fetching ...

Model-Free $μ$-Synthesis: A Nonsmooth Optimization Perspective

Darioush Keivan, Xingang Guo, Peter Seiler, Geir Dullerud, Bin Hu

TL;DR

This paper examines the effectiveness of two model-free policy optimization strategies: the model-free non-derivative sampling method and the zeroth-order policy search with uniform smoothing and demonstrates that both methods consistently replicate the design outcomes achieved by their model-based counterparts.

Abstract

In this paper, we revisit model-free policy search on an important robust control benchmark, namely $μ$-synthesis. In the general output-feedback setting, there do not exist convex formulations for this problem, and hence global optimality guarantees are not expected. Apkarian (2011) presented a nonconvex nonsmooth policy optimization approach for this problem, and achieved state-of-the-art design results via using subgradient-based policy search algorithms which generate update directions in a model-based manner. Despite the lack of convexity and global optimality guarantees, these subgradient-based policy search methods have led to impressive numerical results in practice. Built upon such a policy optimization persepctive, our paper extends these subgradient-based search methods to a model-free setting. Specifically, we examine the effectiveness of two model-free policy optimization strategies: the model-free non-derivative sampling method and the zeroth-order policy search with uniform smoothing. We performed an extensive numerical study to demonstrate that both methods consistently replicate the design outcomes achieved by their model-based counterparts. Additionally, we provide some theoretical justifications showing that convergence guarantees to stationary points can be established for our model-free $μ$-synthesis under some assumptions related to the coerciveness of the cost function. Overall, our results demonstrate that derivative-free policy optimization offers a competitive and viable approach for solving general output-feedback $μ$-synthesis problems in the model-free setting.

Model-Free $μ$-Synthesis: A Nonsmooth Optimization Perspective

TL;DR

This paper examines the effectiveness of two model-free policy optimization strategies: the model-free non-derivative sampling method and the zeroth-order policy search with uniform smoothing and demonstrates that both methods consistently replicate the design outcomes achieved by their model-based counterparts.

Abstract

In this paper, we revisit model-free policy search on an important robust control benchmark, namely -synthesis. In the general output-feedback setting, there do not exist convex formulations for this problem, and hence global optimality guarantees are not expected. Apkarian (2011) presented a nonconvex nonsmooth policy optimization approach for this problem, and achieved state-of-the-art design results via using subgradient-based policy search algorithms which generate update directions in a model-based manner. Despite the lack of convexity and global optimality guarantees, these subgradient-based policy search methods have led to impressive numerical results in practice. Built upon such a policy optimization persepctive, our paper extends these subgradient-based search methods to a model-free setting. Specifically, we examine the effectiveness of two model-free policy optimization strategies: the model-free non-derivative sampling method and the zeroth-order policy search with uniform smoothing. We performed an extensive numerical study to demonstrate that both methods consistently replicate the design outcomes achieved by their model-based counterparts. Additionally, we provide some theoretical justifications showing that convergence guarantees to stationary points can be established for our model-free -synthesis under some assumptions related to the coerciveness of the cost function. Overall, our results demonstrate that derivative-free policy optimization offers a competitive and viable approach for solving general output-feedback -synthesis problems in the model-free setting.
Paper Structure (17 sections, 2 theorems, 25 equations, 3 figures, 1 table, 2 algorithms)

This paper contains 17 sections, 2 theorems, 25 equations, 3 figures, 1 table, 2 algorithms.

Key Result

lemma 1

Suppose $B := \left[ \right]$ and $C := \left[ \right]^\top$ are full rank matrices. Then the objective function $J_c(K_c)$ defined by eq:opt_new is coercive over the set $\mathcal{K}_c$ in the sense that for any sequence $\{K_c^l\}_{l=1}^\infty \subset \mathcal{K}_c$ we have if either $\|K_c^l\|_F \rightarrow +\infty$, or $K_c^l$ converges to an element in the boundary $\partial \mathcal{K}_c$

Figures (3)

  • Figure 1: Interconnection for Robust Synthesis
  • Figure 2: Algorithm \ref{['alg:DF_PO']} iterates in policy space for Doyle's example.
  • Figure 3: Left: The plot illustrates the normalized deviation of trajectories from Algorithm \ref{['alo:NS']} relative to the MATLAB musyn function outputs, denoted as $\mu_{DK}$, across system states $n_x = \{10, 20, 30\}$. Right: The plot illustrates the normalized deviation of trajectories from Algorithm \ref{['alg:DF_PO']} relative to the MATLAB musyn function outputs, denoted as $\mu_{DK}$, across system states $n_x = \{10, 20, 30\}$. Solid lines depict the mean values, and the shaded regions represent the 98% confidence intervals.

Theorems & Definitions (5)

  • definition 1
  • lemma 1
  • proof
  • lemma 2
  • proof