Table of Contents
Fetching ...

STLGame: Signal Temporal Logic Games in Adversarial Multi-Agent Systems

Shuo Yang, Hongrui Zheng, Cristian-Ioan Vasile, George Pappas, Rahul Mangharam

TL;DR

STLGame addresses robust policy synthesis for STL tasks in adversarial continuous multi-agent environments. It formulates the problem as a two-player zero-sum stochastic game and solves it via fictitious play, using a gradient-based best-response computed from differentiable STL robustness to converge toward a Nash equilibrium ($NE$) policy profile. Gradient-based BR outperforms reinforcement learning baselines in convergence and sample efficiency, achieving near-zero exploitability on Ackermann steering vehicles and autonomous drones. The resulting policies demonstrate strong robustness to unseen opponents, highlighting the practical potential of STL-based adversarial planning for safety-critical autonomous systems.

Abstract

We study how to synthesize a robust and safe policy for autonomous systems under signal temporal logic (STL) tasks in adversarial settings against unknown dynamic agents. To ensure the worst-case STL satisfaction, we propose STLGame, a framework that models the multi-agent system as a two-player zero-sum game, where the ego agents try to maximize the STL satisfaction and other agents minimize it. STLGame aims to find a Nash equilibrium policy profile, which is the best case in terms of robustness against unseen opponent policies, by using the fictitious self-play (FSP) framework. FSP iteratively converges to a Nash profile, even in games set in continuous state-action spaces. We propose a gradient-based method with differentiable STL formulas, which is crucial in continuous settings to approximate the best responses at each iteration of FSP. We show this key aspect experimentally by comparing with reinforcement learning-based methods to find the best response. Experiments on two standard dynamical system benchmarks, Ackermann steering vehicles and autonomous drones, demonstrate that our converged policy is almost unexploitable and robust to various unseen opponents' policies. All code and additional experimental results can be found on our project website: https://sites.google.com/view/stlgame

STLGame: Signal Temporal Logic Games in Adversarial Multi-Agent Systems

TL;DR

STLGame addresses robust policy synthesis for STL tasks in adversarial continuous multi-agent environments. It formulates the problem as a two-player zero-sum stochastic game and solves it via fictitious play, using a gradient-based best-response computed from differentiable STL robustness to converge toward a Nash equilibrium () policy profile. Gradient-based BR outperforms reinforcement learning baselines in convergence and sample efficiency, achieving near-zero exploitability on Ackermann steering vehicles and autonomous drones. The resulting policies demonstrate strong robustness to unseen opponents, highlighting the practical potential of STL-based adversarial planning for safety-critical autonomous systems.

Abstract

We study how to synthesize a robust and safe policy for autonomous systems under signal temporal logic (STL) tasks in adversarial settings against unknown dynamic agents. To ensure the worst-case STL satisfaction, we propose STLGame, a framework that models the multi-agent system as a two-player zero-sum game, where the ego agents try to maximize the STL satisfaction and other agents minimize it. STLGame aims to find a Nash equilibrium policy profile, which is the best case in terms of robustness against unseen opponent policies, by using the fictitious self-play (FSP) framework. FSP iteratively converges to a Nash profile, even in games set in continuous state-action spaces. We propose a gradient-based method with differentiable STL formulas, which is crucial in continuous settings to approximate the best responses at each iteration of FSP. We show this key aspect experimentally by comparing with reinforcement learning-based methods to find the best response. Experiments on two standard dynamical system benchmarks, Ackermann steering vehicles and autonomous drones, demonstrate that our converged policy is almost unexploitable and robust to various unseen opponents' policies. All code and additional experimental results can be found on our project website: https://sites.google.com/view/stlgame

Paper Structure

This paper contains 15 sections, 20 equations, 7 figures, 1 table, 1 algorithm.

Figures (7)

  • Figure 1: STL Gradient-based Best Response
  • Figure 2: Vehicles trajectories sampled from some FSP iterations and finally from Nash profile. In the trajectories sampled from Nash profile, the opponent is trying its best to block the ego vehicle.
  • Figure 3: Drone trajectories from randomly initialized policy (Left) to FSP iteration1 (Middle) and finally Nash profile (Right). For better visualizations, please see our https://sites.google.com/view/stlgame.
  • Figure 4: Exploitability for Ackermann Steering Vehicles (Left) and Autonomous Drones (Right). We run five different initial conditions for each environment.
  • Figure 5: Sampled vehicles trajectories when the ego vehicle is playing a Nash policy, and the opponent is playing a Nash (Left), or some unseen policy (Unseen 1, Unseen 2, Unseen 3).
  • ...and 2 more figures

Theorems & Definitions (2)

  • Definition 1
  • Definition 2