A Data Driven Structural Decomposition of Dynamic Games via Best Response Maps

Mahdis Rabbani; Navid Mojahed; Shima Nazari

A Data Driven Structural Decomposition of Dynamic Games via Best Response Maps

Mahdis Rabbani, Navid Mojahed, Shima Nazari

TL;DR

This paper tackles the challenge of computing generalized Nash equilibria in dynamic games by introducing an asymmetric, data-driven structural reduction. It replaces the online best-response block of the opponent with an offline-learned best-response feasibility constraint, yielding a reduced problem solvable by standard NLP/MCP solvers without differentiating through the BR. The authors prove that, when the BR is exact, solutions to the reduced problem correspond to local open-loop GNE, and they demonstrate approximate equilibrium consistency when using a learned BR surrogate. Empirically, they validate the approach on a two-player autonomous racing benchmark, showing competitive ego-performance to full-information baselines while operating under asymmetric information and highlighting safety considerations tied to BR approximation quality.

Abstract

Dynamic games are powerful tools to model multi-agent decision-making, yet computing Nash (generalized Nash) equilibria remains a central challenge in such settings. Complexity arises from tightly coupled optimality conditions, nested optimization structures, and poor numerical conditioning. Existing game-theoretic solvers address these challenges by directly solving the joint game, typically requiring explicit modeling of all agents' objective functions and constraints, while learning-based approaches often decouple interaction through prediction or policy approximation, sacrificing equilibrium consistency. This paper introduces a conceptually novel formulation for dynamic games by restructuring the equilibrium computation. Rather than solving a fully coupled game or decoupling agents through prediction or policy approximation, a data-driven structural reduction of the game is proposed that removes nested optimization layers and derivative coupling by embedding an offline-compiled best-response map as a feasibility constraint. Under standard regularity conditions, when the best-response operator is exact, any converged solution of the reduced problem corresponds to a local open-loop Nash (GNE) equilibrium of the original game; with a learned surrogate, the solution is approximately equilibrium-consistent up to the best-response approximation error. The proposed formulation is supported by mathematical proofs, accompanying a large-scale Monte Carlo study in a two-player open-loop dynamic game motivated by the autonomous racing problem. Comparisons are made against state-of-the-art joint game solvers, and results are reported on solution quality, computational cost, and constraint satisfaction.

A Data Driven Structural Decomposition of Dynamic Games via Best Response Maps

TL;DR

Abstract

Paper Structure (24 sections, 1 theorem, 41 equations, 6 figures, 5 tables)

This paper contains 24 sections, 1 theorem, 41 equations, 6 figures, 5 tables.

Introduction
Technical Contribution
Related Work
Problem Statement
Decomposition via Best-Response Map
Illustrative Example
Implementation
Nash Interpretation
Data-Driven Best-Response Operator
Data-Driven Approximation
Two--Player Racing Simulation
Benchmark Problem: Two-Player Racing on a Constant--Curvature Track
Best-Response Surrogate: Architecture & Training
Architecture & Bounds.
Training Loss.
...and 9 more sections

Key Result

Theorem 1

Let $\Gamma^\star=(Z_1^\star,\Lambda_1^\star,Z_2^\star)$ be a solution of the reduced KKT defined in eq:reduced_mcp_residual together with the complementarity bounds. Then $(Z_1^\star,Z_2^\star)$ is a local open-loop generalized Nash equilibrium.

Figures (6)

Figure 1: Overall Pipeline. During the training process on the left, the MLP learns the policy $\pi_\theta$ (see \ref{['sec:exp_br_mlp']} through three different loss functions. Then, the best response approximation $\widehat{\mathcal{B}}_2$ is obtained by aggregating the learned policy and system dynamics of Player 2. Finally, we solve for the proposed reduced KKT Condition using the approximated best response surrogate.
Figure 2: Reduced formulation performance (ours) over $N=1200$ Monte Carlo instances. (a) IPOPT termination status counts. (b) Empirical CDF (ECDF) of wall-clock solve time (log-scale x-axis). (c) Distribution of IPOPT iteration counts on converged runs (clipped at the 99th percentile for readability).
Figure 3: Constraint diagnostics for the reduced formulation (ours). (a) Aggregate infeasibility score $s_{\mathrm{infeas}}$ stratified by termination status (log-scale x-axis). (b) Minimum collision margin $\min_k(\|p_{1,k}-p_{2,k}\|_2-d_{\mathrm{safe}})$; values below zero indicate safety-distance violations.
Figure 4: Comparison to full-information baselines over $N=1200$ Monte Carlo instances. (a) Iterations on successful runs. (b) Minimum collision margin on successful runs. (c) Paired ego-cost differences $\Delta J_1 := J_1^{\mathrm{ours}}-J_1^{\mathrm{base}}$ on instances where both solvers succeeded (negative is better for ours). (d) Outcome counts (success vs. non-success).
Figure 5: Solver outcomes across the distribution of initial states used in the benchmark. Each marker corresponds to one Monte Carlo initial-condition pair.
...and 1 more figures

Theorems & Definitions (5)

Definition 1
Theorem 1
proof
Remark 1: Approximate equilibrium consistency
Remark 2

A Data Driven Structural Decomposition of Dynamic Games via Best Response Maps

TL;DR

Abstract

A Data Driven Structural Decomposition of Dynamic Games via Best Response Maps

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (5)