Cooperative Optimal Output Tracking for Discrete-Time Multiagent Systems: Stabilizing Policy Iteration Frameworks and Analysis

Dongdong Li; Jiuxiang Dong

Cooperative Optimal Output Tracking for Discrete-Time Multiagent Systems: Stabilizing Policy Iteration Frameworks and Analysis

Dongdong Li, Jiuxiang Dong

TL;DR

This work addresses cooperative optimal output tracking (COOT) for discrete-time, linear multi-agent systems with unknown dynamics. It develops two model-free frameworks: (1) a stabilizing policy-iteration (PI) approach that can start from an arbitrary policy, and (2) an equivalent $Q$-learning PI framework, both enabling data-driven computation of stabilizing and optimal feedback gains and the regulator equations. The authors introduce explicit iteration-step schemes to guarantee stability, separately solve regulator equations, and demonstrate model-free convergence to the optimal gains $K_i^*$ and feedforward terms $T_i$ with validated simulations. The proposed data-driven approaches reduce data requirements and improve efficiency while providing stability and optimality guarantees, offering practical impact for distributed control of MASs with uncertain dynamics.

Abstract

In this paper, two model-free optimal output tracking frameworks based on policy iteration for discrete-time multi-agent systems are proposed. First, we establish a framework of stabilizing policy iteration that can start from any initial feedback control policy, relaxing the dependence of traditional policy iteration on the initial stabilizing control policy. Then, another efficient and equivalent $Q$-learning policy iteration framework is developed, which is shown to require only less system data to get the same results as the stabilizing policy iteration. Both frameworks obtain stabilizing control policy by iterating the stabilizing virtual closed-loop system step-by-step to the actual closed-loop system. Multiple explicit schemes for the iteration step-size/coefficient are designed and their stability during the above iterations is analyzed. By using the generated closed-loop stabilizing control policy and two frameworks, the optimal feedback control gain is obtained. The approximate solution of the regulator equations is found by model-free iteration, which leads to the optimal feedforward gain. Finally, the cooperative optimal output tracking is realized by a distributed feedforward-feedback controller. The proposed algorithms are validated by simulation.

Cooperative Optimal Output Tracking for Discrete-Time Multiagent Systems: Stabilizing Policy Iteration Frameworks and Analysis

TL;DR

-learning PI framework, both enabling data-driven computation of stabilizing and optimal feedback gains and the regulator equations. The authors introduce explicit iteration-step schemes to guarantee stability, separately solve regulator equations, and demonstrate model-free convergence to the optimal gains

and feedforward terms

with validated simulations. The proposed data-driven approaches reduce data requirements and improve efficiency while providing stability and optimality guarantees, offering practical impact for distributed control of MASs with uncertain dynamics.

Abstract

-learning policy iteration framework is developed, which is shown to require only less system data to get the same results as the stabilizing policy iteration. Both frameworks obtain stabilizing control policy by iterating the stabilizing virtual closed-loop system step-by-step to the actual closed-loop system. Multiple explicit schemes for the iteration step-size/coefficient are designed and their stability during the above iterations is analyzed. By using the generated closed-loop stabilizing control policy and two frameworks, the optimal feedback control gain is obtained. The approximate solution of the regulator equations is found by model-free iteration, which leads to the optimal feedforward gain. Finally, the cooperative optimal output tracking is realized by a distributed feedforward-feedback controller. The proposed algorithms are validated by simulation.

Paper Structure (21 sections, 14 theorems, 95 equations, 8 figures, 2 algorithms)

This paper contains 21 sections, 14 theorems, 95 equations, 8 figures, 2 algorithms.

Introduction
Problem formulation and preliminaries
Problem formulation
Preliminaries
Model-based solution to the COOT problem
Model-based solution to regulator equations
Model-based solution to the optimization problem by a stabilizing PI
Data-driven solution to the COOT by off-policy
Data collection and stabilizing policy computation
Determine initial coefficient $\tilde{\beta}_{i}+\alpha_{i}^{0}$
Choose iteration step-size $\alpha_{i}^{k+1}$ by model-free methods
Stabilizing PI-based data-driven COOT Algorithm 1
$Q$-learning algorithm based on stabilizing PI
$Q$-learning establishment and stabilizing policy computation
Determine $\tilde{\beta}_{i}$ and $\alpha_{i}^{k+1}$ by $Q$-learning.
...and 6 more sections

Key Result

Lemma 1

feng2024discrete Given the system 1, 2 and observer 3, under Assumption A1, for any initial $E_{i}(0)$, $F_{i}(0)$ and $\zeta_{i}(0)$ with $i=1,2,\ldots,N$, there are $\lim_{t\rightarrow\infty}\tilde{E}_{i}(t)=0$, $\lim_{t\rightarrow\infty}\tilde{F}_{i}(t)=0$ and $\lim_{t\rightarrow\infty}\tilde{\ze

Figures (8)

Figure 1: The communication graph.
Figure 2: (a). The outputs of MASs obtained by using Algorithm \ref{['alg1']}; (b). The outputs of MASs obtained by using Algorithm \ref{['alg2']}.
Figure 3: Verification of Lemmas \ref{['L4']} and \ref{['L7']}: (a). The closed-loop system spectral radius $\rho(A_{i}-B_{i}\tilde{K}_{i}^{k})$ obtained by using Algorithm \ref{['alg1']} with Scheme 2, where $a_{i}=0.5$; (b). $\rho(A_{i}-B_{i}\tilde{K}_{i}^{k})$ obtained by using Algorithm \ref{['alg2']} with Scheme A, where $a_{i}=0.5$.
Figure 4: (a). Optimality error $\|P_{i}^{j}-P_{i}^*\|$ obtained by using Algorithm \ref{['alg1']}; (b). optimal feedback policy error $\|K_{i}^{j}-K_{i}^*\|$ obtained by using Algorithm \ref{['alg1']}.
Figure 5: (a). The error $\|\check{\chi}_{i}^{n}-\check{\chi}_{i}^*\|$ obtained by using Algorithm \ref{['alg1']}; (b). The error $\|\check{\chi}_{i}^{n}-\check{\chi}_{i}^*\|$ obtained by using Algorithm \ref{['alg2']}, where $\check{\chi}_{i}^{n}=vec([(X_{i}^{n})^{T},(U_{i}^{n})^{T}]^{T})$ and $\check{\chi}_{i}^{*}=vec([(X_{i}^{*})^{T},(U_{i}^{*})^{T}]^{T})$.
...and 3 more figures

Theorems & Definitions (26)

Lemma 1
Lemma 2
Theorem 1
Lemma 3
Remark 1
Theorem 2
Remark 2
Lemma 4
Remark 3
Lemma 5
...and 16 more

Cooperative Optimal Output Tracking for Discrete-Time Multiagent Systems: Stabilizing Policy Iteration Frameworks and Analysis

TL;DR

Abstract

Cooperative Optimal Output Tracking for Discrete-Time Multiagent Systems: Stabilizing Policy Iteration Frameworks and Analysis

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (26)