Stackelberg Meta-Learning for Strategic Guidance in Multi-Robot Trajectory Planning
Yuhan Zhao, Quanyan Zhu
TL;DR
This work addresses trajectory guidance for a leader guiding diverse followers with unknown decision models. It casts the interaction as a dynamic Stackelberg game and introduces a meta-learning framework to learn a meta-best-response that can be quickly adapted to a new follower using limited data. The leader then performs receding-horizon planning with a follower-specific BR refined via adaptation and PMP-based optimization, achieving collision-free trajectories in cluttered environments. Experiments show that meta-learning enhances generalization and adaptation speed compared with non-meta baselines and that guidance yields substantial performance gains over zero-guidance scenarios, demonstrating practical value for heterogeneous multi-robot coordination.
Abstract
Trajectory guidance requires a leader robotic agent to assist a follower robotic agent to cooperatively reach the target destination. However, planning cooperation becomes difficult when the leader serves a family of different followers and has incomplete information about the followers. There is a need for learning and fast adaptation of different cooperation plans. We develop a Stackelberg meta-learning approach to address this challenge. We first formulate the guided trajectory planning problem as a dynamic Stackelberg game to capture the leader-follower interactions. Then, we leverage meta-learning to develop cooperative strategies for different followers. The leader learns a meta-best-response model from a prescribed set of followers. When a specific follower initiates a guidance query, the leader quickly adapts to the follower-specific model with a small amount of learning data and uses it to perform trajectory guidance. We use simulations to elaborate that our method provides a better generalization and adaptation performance on learning followers' behavior than other learning approaches. The value and the effectiveness of guidance are also demonstrated by the comparison with zero guidance scenarios.
