Policy Gradient Methods for Designing Dynamic Output Feedback Controllers
Tomonori Sadamoto, Takumi Hirai
TL;DR
The paper tackles the challenge of designing dynamic output feedback controllers for discrete-time partially observable systems using policy-gradient methods. It introduces an $L$-length input-output history (IOH) framework that recasts dynamic output feedback as a state-feedback problem on an IOH-embedded system, enabling a model-based PGM with global linear convergence via the Polyak–Łojasiewicz inequality applied to a lossless projection of the IOH dynamics. It also develops model-free, zeroth-order PGM variants with Monte Carlo gradient estimates and provides a rigorous sample-complexity analysis, supported by numerical simulations that show robustness to noise and scalability to larger networks. Collectively, this work advances data-driven control by delivering provable convergence guarantees and practical learning algorithms for dynamic output feedback in partially observed settings.
Abstract
This paper proposes model-based and model-free policy gradient methods (PGMs) for designing dynamic output feedback controllers for discrete-time partially observable systems. To fulfill this objective, we first show that any dynamic output feedback controller design is equivalent to a state-feedback controller design for a newly introduced system whose internal state is a finite-length input-output history (IOH). Next, based on this equivalency, we propose a model-based PGM and show its global linear convergence by proving that the Polyak-Lojasiewicz inequality holds for a reachability-based lossless projection of the IOH dynamics. Moreover, we propose two model-free implementations of the PGM: the multi- and single-episodic PGM. The former is a Monte Carlo approximation of the model-based PGM, whereas the latter is a simplified version of the former for ease of use in real systems. A sample complexity analysis of both methods is also presented. Finally, the effectiveness of the model-based/model-free PGMs is investigated through a numerical simulation.
