Who Plays First? Optimizing the Order of Play in Stackelberg Games with Many Robots
Haimin Hu, Gabriele Dragotto, Zixu Zhang, Kaiqu Liang, Bartolomeo Stellato, Jaime F. Fisac
TL;DR
This work tackles the problem of determining the socially optimal order of play in an $N$-player Stackelberg trajectory game where self-interested agents must avoid collisions. It introduces Branch and Play (BNP), an exact branch-and-bound framework that implicitly explores permutations and relies on a local Stackelberg equilibrium solver to bound subproblems. A key component is Sequential Trajectory Planning (STP), which provides fast, reliable local equilibria to feed BNP's bounds, along with practical enhancements such as expanded safety margins and safety filters. The authors validate BNP across air traffic control, quadrotor swarm formation, and hardware delivery fleets, showing consistent improvements over baselines in social cost and coordination efficiency. The approach offers a scalable, real-time solution for coordinating large robot teams under regulator-driven order of play, with potential extensions to more general cost structures and learning-based subsystems.
Abstract
We consider the multi-agent spatial navigation problem of computing the socially optimal order of play, i.e., the sequence in which the agents commit to their decisions, and its associated equilibrium in an N-player Stackelberg trajectory game. We model this problem as a mixed-integer optimization problem over the space of all possible Stackelberg games associated with the order of play's permutations. To solve the problem, we introduce Branch and Play (B&P), an efficient and exact algorithm that provably converges to a socially optimal order of play and its Stackelberg equilibrium. As a subroutine for B&P, we employ and extend sequential trajectory planning, i.e., a popular multi-agent control approach, to scalably compute valid local Stackelberg equilibria for any given order of play. We demonstrate the practical utility of B&P to coordinate air traffic control, swarm formation, and delivery vehicle fleets. We find that B&P consistently outperforms various baselines, and computes the socially optimal equilibrium.
