Learning and steering game dynamics towards desirable outcomes

Ilayda Canyakmaz; Iosif Sakos; Wayne Lin; Antonios Varvitsiotis; Georgios Piliouras

Learning and steering game dynamics towards desirable outcomes

Ilayda Canyakmaz, Iosif Sakos, Wayne Lin, Antonios Varvitsiotis, Georgios Piliouras

TL;DR

The work tackles steering evolving game dynamics toward desirable equilibria when the underlying update rules are unknown and data are scarce. It introduces SIAR-MPC, which couples Side Information Assisted Regression with Model Predictive Control by extending SIAR to learn controlled dynamics with constraints like robust forward invariance and positive correlation, and then applying MPC to compute dynamic incentives. Across coordination and zero-sum games, including chaotic regimes, SIAR-MPC achieves convergence to socially optimal equilibria and stabilizes oscillatory behavior using far fewer training samples than competing methods such as SINDYc and PINN-MPC. This data-efficient framework holds promise for real-time, constraint-aware policy design in strategic environments where model-free controllers would struggle with limited observations.

Abstract

Game dynamics, which describe how agents' strategies evolve over time based on past interactions, can exhibit a variety of undesirable behaviours including convergence to suboptimal equilibria, cycling, and chaos. While central planners can employ incentives to mitigate such behaviors and steer game dynamics towards desirable outcomes, the effectiveness of such interventions critically relies on accurately predicting agents' responses to these incentives -- a task made particularly challenging when the underlying dynamics are unknown and observations are limited. To address this challenge, this work introduces the Side Information Assisted Regression with Model Predictive Control (SIAR-MPC) framework. We extend the recently introduced SIAR method to incorporate the effect of control, enabling it to utilize side-information constraints inherent to game-theoretic applications to model agents' responses to incentives from scarce data. MPC then leverages this model to implement dynamic incentive adjustments. Our experiments demonstrate the effectiveness of SIAR-MPC in guiding systems towards socially optimal equilibria, stabilizing chaotic and cycling behaviors. Notably, it achieves these results in data-scarce settings of few learning samples, where well-known system identification methods paired with MPC show less effective results.

Learning and steering game dynamics towards desirable outcomes

TL;DR

Abstract

Paper Structure (29 sections, 5 theorems, 54 equations, 4 figures, 8 tables)

This paper contains 29 sections, 5 theorems, 54 equations, 4 figures, 8 tables.

Introduction
Related Work
Preliminaries
The SIAR-MPC Framework
The System Identification Step
The Control Step
Experiments
Stag Hunt Game
Zero-Sum Games & Chaos
Matching Pennies Game
$\epsilon$-RPS Game
Experiments with different initializations and payoff matrices
Conclusion
Additional Experimental Results
Performance Across Multiple Initial Conditions
...and 14 more sections

Key Result

theorem 1

Fix time horizon $T$, desired approximation accuracy $\epsilon > 0$, and desired side information accuracy $\delta > 0$. For any continuously differentiable dynamics $f$ satisfying side information constraints, there exists polynomial dynamics $p$ that $\delta$-satisfies the same side-information co

Figures (4)

Figure 1: Replicator dynamics trajectories in the matching pennies game with and without control. Starting from three initial conditions, (left) without control the system cycles around the equilibrium; (right) with control, trajectories are guided towards the specified equilibrium (indicated by $\star$.)
Figure 2: Performance comparison of SINDY-*MPC (left), *PINN-*MPC (center), and *SIAR-*MPC (right) in steering the replicator dynamics for the stag hunt game.
Figure 3: Performance comparison of *PINN-*MPC and *SIAR-*MPC steering the log-barrier dynamics for the matching pennies game (left) and the replicator dynamics for a $0.25$-RPS game (right).
Figure 4: Vector field of the steering direction of the *SIAR-*MPC when the additional state avoidance constraint is (left), and is not (right) imposed. The restricted area is shaded in blue. The green line corresponds to a controlled trajectory of the replicator dynamics for the matching pennies game.

Theorems & Definitions (8)

theorem 1: Informal
lemma 1: see e.g., stone1948generalized
lemma 2: Grönwell-Bellman Inequality (bellman1943stability)
proposition 1
proof
proposition 2
proof
proof

Learning and steering game dynamics towards desirable outcomes

TL;DR

Abstract

Learning and steering game dynamics towards desirable outcomes

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (8)