Table of Contents
Fetching ...

Policy Gradient Methods for Designing Dynamic Output Feedback Controllers

Tomonori Sadamoto, Takumi Hirai

TL;DR

The paper tackles the challenge of designing dynamic output feedback controllers for discrete-time partially observable systems using policy-gradient methods. It introduces an $L$-length input-output history (IOH) framework that recasts dynamic output feedback as a state-feedback problem on an IOH-embedded system, enabling a model-based PGM with global linear convergence via the Polyak–Łojasiewicz inequality applied to a lossless projection of the IOH dynamics. It also develops model-free, zeroth-order PGM variants with Monte Carlo gradient estimates and provides a rigorous sample-complexity analysis, supported by numerical simulations that show robustness to noise and scalability to larger networks. Collectively, this work advances data-driven control by delivering provable convergence guarantees and practical learning algorithms for dynamic output feedback in partially observed settings.

Abstract

This paper proposes model-based and model-free policy gradient methods (PGMs) for designing dynamic output feedback controllers for discrete-time partially observable systems. To fulfill this objective, we first show that any dynamic output feedback controller design is equivalent to a state-feedback controller design for a newly introduced system whose internal state is a finite-length input-output history (IOH). Next, based on this equivalency, we propose a model-based PGM and show its global linear convergence by proving that the Polyak-Lojasiewicz inequality holds for a reachability-based lossless projection of the IOH dynamics. Moreover, we propose two model-free implementations of the PGM: the multi- and single-episodic PGM. The former is a Monte Carlo approximation of the model-based PGM, whereas the latter is a simplified version of the former for ease of use in real systems. A sample complexity analysis of both methods is also presented. Finally, the effectiveness of the model-based/model-free PGMs is investigated through a numerical simulation.

Policy Gradient Methods for Designing Dynamic Output Feedback Controllers

TL;DR

The paper tackles the challenge of designing dynamic output feedback controllers for discrete-time partially observable systems using policy-gradient methods. It introduces an -length input-output history (IOH) framework that recasts dynamic output feedback as a state-feedback problem on an IOH-embedded system, enabling a model-based PGM with global linear convergence via the Polyak–Łojasiewicz inequality applied to a lossless projection of the IOH dynamics. It also develops model-free, zeroth-order PGM variants with Monte Carlo gradient estimates and provides a rigorous sample-complexity analysis, supported by numerical simulations that show robustness to noise and scalability to larger networks. Collectively, this work advances data-driven control by delivering provable convergence guarantees and practical learning algorithms for dynamic output feedback in partially observed settings.

Abstract

This paper proposes model-based and model-free policy gradient methods (PGMs) for designing dynamic output feedback controllers for discrete-time partially observable systems. To fulfill this objective, we first show that any dynamic output feedback controller design is equivalent to a state-feedback controller design for a newly introduced system whose internal state is a finite-length input-output history (IOH). Next, based on this equivalency, we propose a model-based PGM and show its global linear convergence by proving that the Polyak-Lojasiewicz inequality holds for a reachability-based lossless projection of the IOH dynamics. Moreover, we propose two model-free implementations of the PGM: the multi- and single-episodic PGM. The former is a Monte Carlo approximation of the model-based PGM, whereas the latter is a simplified version of the former for ease of use in real systems. A sample complexity analysis of both methods is also presented. Finally, the effectiveness of the model-based/model-free PGMs is investigated through a numerical simulation.
Paper Structure (23 sections, 14 theorems, 121 equations, 10 figures, 1 algorithm)

This paper contains 23 sections, 14 theorems, 121 equations, 10 figures, 1 algorithm.

Key Result

Lemma 1

Consider ${\bm \Sigma}_{\rm s}$ in 1 and $v$ in def_IOH. If holds, then for any $u$ and $x(0)$, the IOH $v$ and output $y$ obey where $\Gamma \space\coloneqq\space \left[{\mathcal{R}}_L({\bm \Sigma}_{\rm s}) - A^L{\mathcal{O}}_L^{\dagger}({\bm \Sigma}_{\rm s}){\mathcal{H}}_L({\bm \Sigma}_{\rm s}), ~A^L{\mathcal{O}}_L^{\dagger}({\bm \Sigma}_{\rm s})\right]$, and $\Pi \coloneqq [0_{m \times (L-1)

Figures (10)

  • Figure 1: (Blue solid line) Variation of $J(K_i)$ in \ref{['defJ']} for the iteration $i$ of \ref{['gd']} when $L=2$. (Black dotted line) $J(K^{\star})$, where $K^{\star}$ is given by \ref{['optK']}.
  • Figure 2: Bode diagrams of ${\bm K}^{\star}_{\rm s}$ and ${\bm K}_{{\rm s},i}$ for $i$ indicated by the circles in Figure \ref{['fig_J_MB']}, where ${\bm K}^{\star}_{\rm s}$ and ${\bm K}_{{\rm s},i}$ are defined as in \ref{['dyn_K']} with \ref{['ABCD_hat']} for $K^{\star}$ and the corresponding $K_i$, respectively.
  • Figure 3: The blue solid line and red and green dotted lines show the trajectories of $y \in \mathbb R^2$ in \ref{['1']} when $u = K_{\rm SF}^{\star}x$, ${\bm K}_{{\rm s},50\times 10^5}^{(2)}$ and ${\bm K}_{{\rm s},50\times 10^5}^{(4)}$ are actuated at $t=0$, $t=2$, and $t=4$, respectively.
  • Figure 4: Variation of $J(K_i)$ in \ref{['defJ']} when $L=4$.
  • Figure 5: (Colored area) 50 variations of $J(\tilde{K}_i)$ in \ref{['defJ']} for $\{s, N\} = \{1,50\}, \{1,500\}, \{10,500\}$ when $L=2$, where $\tilde{K}_i$ is generated by Algorithm 1. (Colored broken lines) The average of the corresponding 50 variations.
  • ...and 5 more figures

Theorems & Definitions (34)

  • Definition 1
  • Lemma 1
  • proof
  • Lemma 2
  • proof
  • Remark 1
  • Proposition 1
  • proof
  • Remark 2
  • Remark 3
  • ...and 24 more