Table of Contents
Fetching ...

Adaptive Optimal Control of Linear Periodic Systems: An Off-Policy Value Iteration Approach

Bo Pang, Zhong-Ping Jiang

TL;DR

This work tackles infinite-horizon optimal control for continuous-time linear periodic systems by developing a value-iteration based adaptive dynamic programming method that is off-policy and uses Fourier basis representations to approximate the periodic Riccati solution $P^*(t)$ from data without exact dynamics or an initial stabilizing controller. The authors establish uniform convergence of the learned gains $H^*(t)$ and $K^*(t)$ to the optimal solutions under mild assumptions and demonstrate the approach on a triple inverted pendulum with a periodic load, highlighting robustness to disturbances. By separating algorithmic time from system time and leveraging data-driven differentiation of a PRE-derived equation, the method yields practical, stabilizing controllers directly from measurements. The results advance data-driven, stability-guaranteed control for time-varying, periodic systems and open avenues for extensions to related problems such as optimal output regulation.

Abstract

This paper studies the infinite-horizon adaptive optimal control of continuous-time linear periodic (CTLP) systems. A novel value iteration (VI) based off-policy ADP algorithm is proposed for a general class of CTLP systems, so that approximate optimal solutions can be obtained directly from the collected data, without the exact knowledge of system dynamics. Under mild conditions, the proofs on uniform convergence of the proposed algorithm to the optimal solutions are given for both the model-based and model-free cases. The VI-based ADP algorithm is able to find suboptimal controllers without assuming the knowledge of an initial stabilizing controller. Application to the optimal control of a triple inverted pendulum subjected to a periodically varying load demonstrates the feasibility and effectiveness of the proposed method.

Adaptive Optimal Control of Linear Periodic Systems: An Off-Policy Value Iteration Approach

TL;DR

This work tackles infinite-horizon optimal control for continuous-time linear periodic systems by developing a value-iteration based adaptive dynamic programming method that is off-policy and uses Fourier basis representations to approximate the periodic Riccati solution from data without exact dynamics or an initial stabilizing controller. The authors establish uniform convergence of the learned gains and to the optimal solutions under mild assumptions and demonstrate the approach on a triple inverted pendulum with a periodic load, highlighting robustness to disturbances. By separating algorithmic time from system time and leveraging data-driven differentiation of a PRE-derived equation, the method yields practical, stabilizing controllers directly from measurements. The results advance data-driven, stability-guaranteed control for time-varying, periodic systems and open avenues for extensions to related problems such as optimal output regulation.

Abstract

This paper studies the infinite-horizon adaptive optimal control of continuous-time linear periodic (CTLP) systems. A novel value iteration (VI) based off-policy ADP algorithm is proposed for a general class of CTLP systems, so that approximate optimal solutions can be obtained directly from the collected data, without the exact knowledge of system dynamics. Under mild conditions, the proofs on uniform convergence of the proposed algorithm to the optimal solutions are given for both the model-based and model-free cases. The VI-based ADP algorithm is able to find suboptimal controllers without assuming the knowledge of an initial stabilizing controller. Application to the optimal control of a triple inverted pendulum subjected to a periodically varying load demonstrates the feasibility and effectiveness of the proposed method.

Paper Structure

This paper contains 9 sections, 8 theorems, 66 equations, 2 figures, 1 table.

Key Result

Lemma 1

There exists a unique SPPS solution $P^*(\cdot)$ of the PRE, and the corresponding closed-loop system is stable, if and only if Assumption structure_assum is satisfied. In addition,

Figures (2)

  • Figure 1: Overview of derivations and convergence analysis of Algorithm \ref{['VI_Alg_off']}.
  • Figure 2: Comparison of different control gains. $\bar{K}(\cdot)$ is the output of Algorithm \ref{['VI_Alg_off']}; $\hat{K}_k$ is defined in (\ref{['HK_hat']}); $K(\cdot)$ is generated by model-based VI (\ref{['PRE_s']}); $K^*(\cdot)$ is the optimal control gain; $\hat{W}^{K}$ is generated by (\ref{['VI_adp_eqn1']}); $\bar{W}^{K}$ is given in (\ref{['VI_final_fit']}).

Theorems & Definitions (20)

  • Definition 1: Bittanti1991doi:10.1080/00207179208934305
  • Lemma 1
  • Lemma 2: FourierBook
  • Lemma 3
  • proof
  • Theorem 1
  • proof
  • Remark 1
  • Remark 2
  • Remark 3
  • ...and 10 more