Table of Contents
Fetching ...

Learning Nash Equilibrial Hamiltonian for Two-Player Collision-Avoiding Interactions

Lei Zhang, Siddharth Das, Tanner Merry, Wenlong Zhang, Yi Ren

TL;DR

Two-player differential games for collision-avoidance face intractable real-time solution of the Hamilton-Jacobi-Isaacs equations due to equilibrium discontinuities. The authors propose a data-efficient approach that directly learns equilibrium co-states $oldsymbol{bb}^*(oldsymbol{x}_0,t_0)$ under linear dynamics and a collision penalty, enabling equilibrium actions through local Hamiltonian maximization, and augment this with theory-driven active learning guided by Pontryagin's Maximum Principle via an inverse value problem. They prove that, in the collision-avoidance setting with linear dynamics, equilibrium co-states admit a low-dimensional, piecewise-linear representation summarized by $y_i=(oldsymbol{bb}_i^{*}(T),t_i^{in},t_i^{out},q_i)$. Experiments on an uncontrolled intersection show that co-state networks yield lower collision probabilities than value-based models under the same data budget, and that active learning offers additional gains at intermediate data sizes. The results suggest a scalable, principled path to near real-time Nash-equilibrium policies for two-player collision-avoiding tasks, with potential extensions to nonlinear dynamics and multi-agent scenarios.

Abstract

We consider the problem of learning Nash equilibrial policies for two-player risk-sensitive collision-avoiding interactions. Solving the Hamilton-Jacobi-Isaacs equations of such general-sum differential games in real time is an open challenge due to the discontinuity of equilibrium values on the state space. A common solution is to learn a neural network that approximates the equilibrium Hamiltonian for given system states and actions. The learning, however, is usually supervised and requires a large amount of sample equilibrium policies from different initial states in order to mitigate the risks of collisions. This paper claims two contributions towards more data-efficient learning of equilibrium policies: First, instead of computing Hamiltonian through a value network, we show that the equilibrium co-states have simple structures when collision avoidance dominates the agents' loss functions and system dynamics is linear, and therefore are more data-efficient to learn. Second, we introduce theory-driven active learning to guide data sampling, where the acquisition function measures the compliance of the predicted co-states to Pontryagin's Maximum Principle. On an uncontrolled intersection case, the proposed method leads to more generalizable approximation of the equilibrium policies, and in turn, lower collision probabilities, than the state-of-the-art under the same data acquisition budget.

Learning Nash Equilibrial Hamiltonian for Two-Player Collision-Avoiding Interactions

TL;DR

Two-player differential games for collision-avoidance face intractable real-time solution of the Hamilton-Jacobi-Isaacs equations due to equilibrium discontinuities. The authors propose a data-efficient approach that directly learns equilibrium co-states under linear dynamics and a collision penalty, enabling equilibrium actions through local Hamiltonian maximization, and augment this with theory-driven active learning guided by Pontryagin's Maximum Principle via an inverse value problem. They prove that, in the collision-avoidance setting with linear dynamics, equilibrium co-states admit a low-dimensional, piecewise-linear representation summarized by . Experiments on an uncontrolled intersection show that co-state networks yield lower collision probabilities than value-based models under the same data budget, and that active learning offers additional gains at intermediate data sizes. The results suggest a scalable, principled path to near real-time Nash-equilibrium policies for two-player collision-avoiding tasks, with potential extensions to nonlinear dynamics and multi-agent scenarios.

Abstract

We consider the problem of learning Nash equilibrial policies for two-player risk-sensitive collision-avoiding interactions. Solving the Hamilton-Jacobi-Isaacs equations of such general-sum differential games in real time is an open challenge due to the discontinuity of equilibrium values on the state space. A common solution is to learn a neural network that approximates the equilibrium Hamiltonian for given system states and actions. The learning, however, is usually supervised and requires a large amount of sample equilibrium policies from different initial states in order to mitigate the risks of collisions. This paper claims two contributions towards more data-efficient learning of equilibrium policies: First, instead of computing Hamiltonian through a value network, we show that the equilibrium co-states have simple structures when collision avoidance dominates the agents' loss functions and system dynamics is linear, and therefore are more data-efficient to learn. Second, we introduce theory-driven active learning to guide data sampling, where the acquisition function measures the compliance of the predicted co-states to Pontryagin's Maximum Principle. On an uncontrolled intersection case, the proposed method leads to more generalizable approximation of the equilibrium policies, and in turn, lower collision probabilities, than the state-of-the-art under the same data acquisition budget.

Paper Structure

This paper contains 12 sections, 14 equations, 5 figures, 1 algorithm.

Figures (5)

  • Figure 1: (a) Four types of collisions from two players at an uncontrolled intersection, characterized by the entrance and exit facets of the collision box. 1-4: Player 1 moves towards Player 2, who then evades; Player 1 overtakes Player 2; Player 2 overtakes Player 1; Player 2 moves towards Player 1, who then evades. (b) Collision zone for a three-player case, where the types of collisions (and interactions) can still be determined by the finite number of entrance and exit facets.
  • Figure 2: (a) Uncontrolled intersection setup with two players. (b) 3D visualization for collision penalty function $\phi_i(x)$: The dark purple region shows the actual collision area, while the red region represents a safety buffer to avoid collisions.
  • Figure 3: (a) Trajectories in the space of $d_1$ and $d_2$ driven by the equilibrium policies starting from different initial states. (b) The corresponding co-state trajectories along time. (c) Equilibrium value landscape across $d_1$ and $d_2$ with $t_0=0$ and $v_1=v_2=18m/s$. (d) Equilibrium co-state parameter $q_1$ landscape under the same settings of (c). (e, f) Equilibrium values and $q_1$ in the space of $d_1$ and $t$, where $d_2=17.5m$ and $v_1=v_2=18m/s$.
  • Figure 4: (a) Comparison on collision probabilities through closed-loop control by using the value network (blue), statically learned co-state network (green), and actively learned co-state network (orange). The p-values (green and orange) are for one-side t-tests between value and costate, and between active and static, respectively. Statistically significant p-values are highlighted. (b) Generalization performance by the three models on co-state prediction across training data sizes. 1-2: Co-state prediction errors for player 1; 3-4: for player 2.
  • Figure 5: Test interaction trajectories from (a) the ground truth (BVP solver), and closed-loop control based on (b) the value network, (c) the statically trained co-state network, and (d) the actively trained co-state network. Training data size is 250. The grey box is the collision zone.