Stackelberg Meta-Learning for Strategic Guidance in Multi-Robot Trajectory Planning

Yuhan Zhao; Quanyan Zhu

Stackelberg Meta-Learning for Strategic Guidance in Multi-Robot Trajectory Planning

Yuhan Zhao, Quanyan Zhu

TL;DR

This work addresses trajectory guidance for a leader guiding diverse followers with unknown decision models. It casts the interaction as a dynamic Stackelberg game and introduces a meta-learning framework to learn a meta-best-response that can be quickly adapted to a new follower using limited data. The leader then performs receding-horizon planning with a follower-specific BR refined via adaptation and PMP-based optimization, achieving collision-free trajectories in cluttered environments. Experiments show that meta-learning enhances generalization and adaptation speed compared with non-meta baselines and that guidance yields substantial performance gains over zero-guidance scenarios, demonstrating practical value for heterogeneous multi-robot coordination.

Abstract

Trajectory guidance requires a leader robotic agent to assist a follower robotic agent to cooperatively reach the target destination. However, planning cooperation becomes difficult when the leader serves a family of different followers and has incomplete information about the followers. There is a need for learning and fast adaptation of different cooperation plans. We develop a Stackelberg meta-learning approach to address this challenge. We first formulate the guided trajectory planning problem as a dynamic Stackelberg game to capture the leader-follower interactions. Then, we leverage meta-learning to develop cooperative strategies for different followers. The leader learns a meta-best-response model from a prescribed set of followers. When a specific follower initiates a guidance query, the leader quickly adapts to the follower-specific model with a small amount of learning data and uses it to perform trajectory guidance. We use simulations to elaborate that our method provides a better generalization and adaptation performance on learning followers' behavior than other learning approaches. The value and the effectiveness of guidance are also demonstrated by the comparison with zero guidance scenarios.

Stackelberg Meta-Learning for Strategic Guidance in Multi-Robot Trajectory Planning

TL;DR

Abstract

Paper Structure (18 sections, 17 equations, 4 figures, 1 table, 3 algorithms)

This paper contains 18 sections, 17 equations, 4 figures, 1 table, 3 algorithms.

Introduction
Problem Formulation
Trajectory Guidance as Stackelberg Games
Guidance in Trajectory Planning
Meta-Best-Response and Meta-Learning Problem
Stackelberg Meta-Learning
Meta-Best-Response Training
Importance Sampling
Best-Response Adaption
Receding Horizon Planning For Trajectory Guidance
Simulations and Evaluations
Simulation Settings
Meta-Learning Results
Receding Horizon Planning
Comparison With Zero Guidance
...and 3 more sections

Figures (4)

Figure 1: Illustration of Stackelberg meta-learning approach in trajectory guidance. Different follower UGVs rely on the leader UAV's trajectory guidance to reach their destinations. The leader UAV uses meta-learning to learn a meta-best-response model by interacting with different followers (1-2). When guiding a specific follower, the leader uses the follower-specific data (3) to adapt the meta-model to that follower (4) and performs guided trajectory planning (5).
Figure 2: Adaptation results for three learning approaches. Meta-learning provides the best generalization adaptation performance. Param-Ave approach yields a significant adaptation error and poor generalization performance. We divide its loss by 2 in both plots for better visualization.
Figure 3: Guidance trajectories for different followers. The blue and the orange represent the leader and follower trajectories, respectively. Followers start from $[0,8]$ and $[6,0]$ to reach the goal region centered around $[9,9]$. The leader successfully guides all followers to the destination using adapted best-response models and receding horizon planning algorithms.
Figure 4: Myopic trajectories for two types of followers starting from $[0,8]$, $[0,4]$, and $[6,0]$. None of them reach their destination. The color map represents the follower's sensing cost. We can see that type 3 follower has a wider sensing region than type 5 follower.

Stackelberg Meta-Learning for Strategic Guidance in Multi-Robot Trajectory Planning

TL;DR

Abstract

Stackelberg Meta-Learning for Strategic Guidance in Multi-Robot Trajectory Planning

Authors

TL;DR

Abstract

Table of Contents

Figures (4)