Stackelberg Meta-Learning Based Control for Guided Cooperative LQG Systems

Yuhan Zhao; Quanyan Zhu

Stackelberg Meta-Learning Based Control for Guided Cooperative LQG Systems

Yuhan Zhao, Quanyan Zhu

TL;DR

The paper addresses guided cooperative control under incomplete follower information by formulating a dynamic Stackelberg game and introducing a meta-learning framework to acquire a transferable follower-response model. A parametric follower response is embedded into an augmented LQG (via $ ilde{A}$ and $ ilde{B}$) and optimized through a bilevel meta-learning scheme that trains on multiple follower types and adapts to new ones with limited data. The approach combines a data-fitting term with leader-cost optimization through a Riccati-based gradient pipeline, including a data-sampling strategy for robust meta-training and a practical adaptation rule for new followers. Experiments in robot teaming demonstrate improved transferability and guidance efficiency over unilateral and individual learning, highlighting the practical impact for heterogeneous multi-agent control tasks.

Abstract

Guided cooperation allows intelligent agents with heterogeneous capabilities to work together by following a leader-follower type of interaction. However, the associated control problem becomes challenging when the leader agent does not have complete information about follower agents. There is a need for learning and adaptation of cooperation plans. To this end, we develop a meta-learning-based Stackelberg game-theoretic framework to address the challenges in the guided cooperative control for linear systems. We first formulate the guided cooperation between agents as a dynamic Stackelberg game and use the feedback Stackelberg equilibrium as the agent-wise cooperation strategy. We further leverage meta-learning to address the incomplete information of follower agents, where the leader agent learns a meta-response model from a prescribed set of followers offline and adapts to a new coming cooperation task with a small amount of learning data. We use a case study in robot teaming to corroborate the effectiveness of our framework. Comparison with other learning approaches also shows that our learned cooperation strategy provides better transferability for different cooperation tasks.

Stackelberg Meta-Learning Based Control for Guided Cooperative LQG Systems

TL;DR

and

) and optimized through a bilevel meta-learning scheme that trains on multiple follower types and adapts to new ones with limited data. The approach combines a data-fitting term with leader-cost optimization through a Riccati-based gradient pipeline, including a data-sampling strategy for robust meta-training and a practical adaptation rule for new followers. Experiments in robot teaming demonstrate improved transferability and guidance efficiency over unilateral and individual learning, highlighting the practical impact for heterogeneous multi-agent control tasks.

Abstract

Paper Structure (22 sections, 19 equations, 4 figures, 1 algorithm)

This paper contains 22 sections, 19 equations, 4 figures, 1 algorithm.

Introduction
Problem Formulation
Stackelberg Games for Cooperative Control
Meta Response and Meta-learning Objectives
Interpretation on $\gamma$
Meta-Learning as Bilevel Optimization Problems
Stackelberg Meta-Learning
Parametric Optimal Control
Meta-Response Training
Sampling Follower's Response Data
Response Adaptation
Experiments and Evaluations
Meta-learning Results
Comparison with Unilateral Learning
Individual Learning and Transferability
...and 7 more sections

Figures (4)

Figure 1: Meta training and adaptation results.
Figure 2: Trajectories for $\theta=0$ follower after adaptation.
Figure 3: Costs and trajectories for unilateral learning.
Figure 4: Results for individual learning.

Stackelberg Meta-Learning Based Control for Guided Cooperative LQG Systems

TL;DR

Abstract

Stackelberg Meta-Learning Based Control for Guided Cooperative LQG Systems

Authors

TL;DR

Abstract

Table of Contents

Figures (4)