Table of Contents
Fetching ...

Output-Feedback Stabilizing Policy Iteration for Convergence Assurance of Unknown Discrete-Time Systems with Unmeasurable States

Dongdong Li, Jiuxiang Dong

TL;DR

The paper tackles stabilizing unknown discrete-time linear systems with unmeasured states using only input-output data. It develops a data-driven output-feedback SPI that compresses system dynamics, reconstructs a virtual state via Kronecker-based augmentation, and updates a stabilizing policy with a model-free step-size rule while preserving stability. Key contributions include online learning of output-feedback stabilizing controllers without observers or large parameterization matrices, a rigorous IO-data-driven policy evaluation/improvement framework, and stability guarantees proven alongside simulations. This approach enables practical stabilization of unknown DT systems in settings where full state measurements are unavailable.

Abstract

This note proposes a data-driven output-feedback stabilizing policy iteration for unknown linear discrete-time systems with unmeasurable states. Existing policy iteration methods for optimal control must start from a stabilizing control policy, which is particularly challenging to obtain for unknown systems, especially when states are unavailable. In such cases, it is more difficult to guarantee stability and convergence performance. To address this problem, an output-feedback stabilizing policy iteration framework is developed to learn closed-loop stabilizing control policies while ensuring convergence performance. Specifically, cumulative scalar parameters are introduced to compress the original system to a stable scale. Then, by integrating modified policy iteration with parameter update rules, the system is gradually amplified/restored to the original system while preserving stability such that the stabilizing control policy is obtained. The entire process is driven solely by input-output data. Moreover, a stability analysis is provided for output-feedback. The proposed approach is validated by simulations.

Output-Feedback Stabilizing Policy Iteration for Convergence Assurance of Unknown Discrete-Time Systems with Unmeasurable States

TL;DR

The paper tackles stabilizing unknown discrete-time linear systems with unmeasured states using only input-output data. It develops a data-driven output-feedback SPI that compresses system dynamics, reconstructs a virtual state via Kronecker-based augmentation, and updates a stabilizing policy with a model-free step-size rule while preserving stability. Key contributions include online learning of output-feedback stabilizing controllers without observers or large parameterization matrices, a rigorous IO-data-driven policy evaluation/improvement framework, and stability guarantees proven alongside simulations. This approach enables practical stabilization of unknown DT systems in settings where full state measurements are unavailable.

Abstract

This note proposes a data-driven output-feedback stabilizing policy iteration for unknown linear discrete-time systems with unmeasurable states. Existing policy iteration methods for optimal control must start from a stabilizing control policy, which is particularly challenging to obtain for unknown systems, especially when states are unavailable. In such cases, it is more difficult to guarantee stability and convergence performance. To address this problem, an output-feedback stabilizing policy iteration framework is developed to learn closed-loop stabilizing control policies while ensuring convergence performance. Specifically, cumulative scalar parameters are introduced to compress the original system to a stable scale. Then, by integrating modified policy iteration with parameter update rules, the system is gradually amplified/restored to the original system while preserving stability such that the stabilizing control policy is obtained. The entire process is driven solely by input-output data. Moreover, a stability analysis is provided for output-feedback. The proposed approach is validated by simulations.

Paper Structure

This paper contains 10 sections, 32 equations, 2 figures, 1 algorithm.

Figures (2)

  • Figure 1: Closed-loop spectral radius $\rho(A - B\hat{\bar{K}}^{j}\bar{M}^{+})$ and its upper bound $1/(\tilde{\beta} + \sum_{m=0}^{j} \alpha^{m})$, where $\bar{M}^{+} = \bar{M}^{\top}(\bar{M}\bar{M}^{\top})^{-1}$: (a). $\delta=0.1$; (b). $\delta=0.4$; (c). $\delta=0.7$; (d). $\delta=0.9$.
  • Figure 2: The system state trajectories with learned output-feedback controller and without controller.