Cooperative Optimal Output Tracking for Discrete-Time Multiagent Systems: Stabilizing Policy Iteration Frameworks and Analysis
Dongdong Li, Jiuxiang Dong
TL;DR
This work addresses cooperative optimal output tracking (COOT) for discrete-time, linear multi-agent systems with unknown dynamics. It develops two model-free frameworks: (1) a stabilizing policy-iteration (PI) approach that can start from an arbitrary policy, and (2) an equivalent $Q$-learning PI framework, both enabling data-driven computation of stabilizing and optimal feedback gains and the regulator equations. The authors introduce explicit iteration-step schemes to guarantee stability, separately solve regulator equations, and demonstrate model-free convergence to the optimal gains $K_i^*$ and feedforward terms $T_i$ with validated simulations. The proposed data-driven approaches reduce data requirements and improve efficiency while providing stability and optimality guarantees, offering practical impact for distributed control of MASs with uncertain dynamics.
Abstract
In this paper, two model-free optimal output tracking frameworks based on policy iteration for discrete-time multi-agent systems are proposed. First, we establish a framework of stabilizing policy iteration that can start from any initial feedback control policy, relaxing the dependence of traditional policy iteration on the initial stabilizing control policy. Then, another efficient and equivalent $Q$-learning policy iteration framework is developed, which is shown to require only less system data to get the same results as the stabilizing policy iteration. Both frameworks obtain stabilizing control policy by iterating the stabilizing virtual closed-loop system step-by-step to the actual closed-loop system. Multiple explicit schemes for the iteration step-size/coefficient are designed and their stability during the above iterations is analyzed. By using the generated closed-loop stabilizing control policy and two frameworks, the optimal feedback control gain is obtained. The approximate solution of the regulator equations is found by model-free iteration, which leads to the optimal feedforward gain. Finally, the cooperative optimal output tracking is realized by a distributed feedforward-feedback controller. The proposed algorithms are validated by simulation.
