Who Plays First? Optimizing the Order of Play in Stackelberg Games with Many Robots

Haimin Hu; Gabriele Dragotto; Zixu Zhang; Kaiqu Liang; Bartolomeo Stellato; Jaime F. Fisac

Who Plays First? Optimizing the Order of Play in Stackelberg Games with Many Robots

Haimin Hu, Gabriele Dragotto, Zixu Zhang, Kaiqu Liang, Bartolomeo Stellato, Jaime F. Fisac

TL;DR

This work tackles the problem of determining the socially optimal order of play in an $N$-player Stackelberg trajectory game where self-interested agents must avoid collisions. It introduces Branch and Play (BNP), an exact branch-and-bound framework that implicitly explores permutations and relies on a local Stackelberg equilibrium solver to bound subproblems. A key component is Sequential Trajectory Planning (STP), which provides fast, reliable local equilibria to feed BNP's bounds, along with practical enhancements such as expanded safety margins and safety filters. The authors validate BNP across air traffic control, quadrotor swarm formation, and hardware delivery fleets, showing consistent improvements over baselines in social cost and coordination efficiency. The approach offers a scalable, real-time solution for coordinating large robot teams under regulator-driven order of play, with potential extensions to more general cost structures and learning-based subsystems.

Abstract

We consider the multi-agent spatial navigation problem of computing the socially optimal order of play, i.e., the sequence in which the agents commit to their decisions, and its associated equilibrium in an N-player Stackelberg trajectory game. We model this problem as a mixed-integer optimization problem over the space of all possible Stackelberg games associated with the order of play's permutations. To solve the problem, we introduce Branch and Play (B&P), an efficient and exact algorithm that provably converges to a socially optimal order of play and its Stackelberg equilibrium. As a subroutine for B&P, we employ and extend sequential trajectory planning, i.e., a popular multi-agent control approach, to scalably compute valid local Stackelberg equilibria for any given order of play. We demonstrate the practical utility of B&P to coordinate air traffic control, swarm formation, and delivery vehicle fleets. We find that B&P consistently outperforms various baselines, and computes the socially optimal equilibrium.

Who Plays First? Optimizing the Order of Play in Stackelberg Games with Many Robots

TL;DR

This work tackles the problem of determining the socially optimal order of play in an

-player Stackelberg trajectory game where self-interested agents must avoid collisions. It introduces Branch and Play (BNP), an exact branch-and-bound framework that implicitly explores permutations and relies on a local Stackelberg equilibrium solver to bound subproblems. A key component is Sequential Trajectory Planning (STP), which provides fast, reliable local equilibria to feed BNP's bounds, along with practical enhancements such as expanded safety margins and safety filters. The authors validate BNP across air traffic control, quadrotor swarm formation, and hardware delivery fleets, showing consistent improvements over baselines in social cost and coordination efficiency. The approach offers a scalable, real-time solution for coordinating large robot teams under regulator-driven order of play, with potential extensions to more general cost structures and learning-based subsystems.

Abstract

Paper Structure (32 sections, 6 theorems, 12 equations, 15 figures, 1 table, 2 algorithms)

This paper contains 32 sections, 6 theorems, 12 equations, 15 figures, 1 table, 2 algorithms.

Introduction
Related Work
Game-Theoretic Planning
Multilevel Optimization for Stackelberg Games
Multi-Robot Trajectory Planning
Problem Formulation
Branch and Play
Incomplete Permutations and Their Bounds.
The Algorithm
Exploration, Pruning, and Convergence
Exploration strategies
Pruning strategies
Convergence
Receding Horizon Planning and Warmstart
Computing Stackelberg Equilibria with STP
...and 17 more sections

Key Result

Proposition 1

For any complete permutation $p \in {P}$, its value is an upper bound on the optimal social cost, i.e., ${\mathbf{J}}({\gamma}(p)) \ge {\mathbf{J}}^*({\gamma}(p^*))$.

Figures (15)

Figure 1: Our game-theoretic planning method computes the socially optimal Stackelberg equilibrium in real time. (a) Starting from an air traffic control zone with eight airplanes flying on collision courses, our method computes collision-free and socially optimal trajectories (warmer color denotes higher priority) compared to the baselines. (b) Our method handles moving targets when applied for a quadrotor swarm formation task in the AirSim simulator shah2018airsim. (c) Our method coordinates a delivery vehicle fleet in a scaled metropolitan area (vehicle snapshots corresponding to later time steps have higher transparency). The video is available at https://youtu.be/wb6cMYJ43-s
Figure 2: An overview of applied to air traffic control, where we employ as subgame solver. computes the socially optimal order of play (the jet is the leader, followed by the helicopter, and finally the quadrotor) and broadcasts it to all airborne agents in the zone. Once the regulator broadcasts the order of play, conditioned on the socially optimal order of play runs at a higher frequency: Each agent optimizes its own trajectory based on predecessors' plan, and communicates this information with its successors.
Figure 3: A ROS-based implementation of for coordinating a delivery vehicle fleet in a scaled metropolis. Each truck performs onboard computation with an Nvidia Jetson Xavier NX computer, which runs a visual-inertial SLAM algorithm for localization. The computer also solves the for the ego vehicle's actions (acceleration and steering angle) based on leading vehicles' planned trajectories communicated wirelessly. The is solved on a desktop and the optimal permutation $p$ is then broadcast to each truck.
Figure 4: Illustration of a search tree. Trajectories of unassigned players, who avoid all assigned players, are plotted in grey. In the root node Ⓡ, all players are unaware of each other, resulting in a lower bound of the social cost. Players in nodes ③, ④, and ⑤ are collision-free. Unassigned players in those nodes can be given any order of player and there is no need to descend further. Nodes ④ and ⑤ are pruned since they produce a higher cost than the feasible solution Ⓢ, which turns out to be the optimal solution.
Figure 5: Computation time for obtaining the socially optimal permutation over $100$ randomized trials of the example. Solid lines and shaded areas represent the sample mean and standard deviation, respectively. Brute-force computation time for ${N} \in [7,10]$ are indicated above the black triangles.
...and 10 more figures

Theorems & Definitions (23)

Definition 1: Global Stackelberg equilibrium
Definition 2: Local Stackelberg equilibrium
Remark 1
Definition 3: Permutation and order of play
Definition 4: socially optimal equilibrium
Example 1
Remark 2
Definition 5: Value of a permutation
Definition 6: Incomplete permutation
Proposition 1: Upper bound
...and 13 more

Who Plays First? Optimizing the Order of Play in Stackelberg Games with Many Robots

TL;DR

Abstract

Who Plays First? Optimizing the Order of Play in Stackelberg Games with Many Robots

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (15)

Theorems & Definitions (23)