Table of Contents
Fetching ...

Coupled Wasserstein Gradient Flows for Min-Max and Cooperative Games

Lauren Conger, Franca Hoffmann, Eric Mazumdar, Lillian J. Ratliff

TL;DR

We address the problem of modeling two-player interactions over distributions via coupled Wasserstein gradient flows, formulating min-max and cooperative two-species PDEs that evolve in Wasserstein-2 space. The authors establish rigorous guarantees for existence and uniqueness of steady states and Nash equilibria, plus exponential convergence to these equilibria under displacement convexity/concavity, via an HWI-type inequality and a Danskin-type Gamma-convergence approach. They extend the theory to timescale-separated regimes in ML-driven distribution shift, providing explicit rates and a practical interpretation for how algorithmic updates interact with strategic populations. Numerical experiments on real data (Colombia census, loan applications) and performative prediction demonstrate distribution-level effects and the necessity of modeling intra-population interactions beyond mere moments. The results advance the understanding of infinite-dimensional game dynamics and offer a principled framework for analyzing distribution shift in ML systems with strategic agents.

Abstract

We propose a framework for two-player infinite-dimensional games with cooperative or competitive structure. These games take the form of coupled partial differential equations in which players optimize over a space of measures, driven by either a gradient descent or gradient descent-ascent in Wasserstein-2 space. We characterize the properties of the Nash equilibrium of the system, and relate it to the steady state of the dynamics. In the min-max setting, we show, under sufficient convexity conditions, that solutions converge exponentially fast and with explicit rate to the unique Nash equilibrium. Similar results are obtained for the cooperative setting. We apply this framework to distribution shift induced by interactions among a strategic population of agents and an algorithm, proving additional convergence results in the timescale-separated setting. We illustrate the performance of our model on (i) real data from an economics study on Colombia census data, (ii) feature modification in loan applications, and (iii) performative prediction. The numerical experiments demonstrate the importance of distribution-level, rather than moment-level, modeling.

Coupled Wasserstein Gradient Flows for Min-Max and Cooperative Games

TL;DR

We address the problem of modeling two-player interactions over distributions via coupled Wasserstein gradient flows, formulating min-max and cooperative two-species PDEs that evolve in Wasserstein-2 space. The authors establish rigorous guarantees for existence and uniqueness of steady states and Nash equilibria, plus exponential convergence to these equilibria under displacement convexity/concavity, via an HWI-type inequality and a Danskin-type Gamma-convergence approach. They extend the theory to timescale-separated regimes in ML-driven distribution shift, providing explicit rates and a practical interpretation for how algorithmic updates interact with strategic populations. Numerical experiments on real data (Colombia census, loan applications) and performative prediction demonstrate distribution-level effects and the necessity of modeling intra-population interactions beyond mere moments. The results advance the understanding of infinite-dimensional game dynamics and offer a principled framework for analyzing distribution shift in ML systems with strategic agents.

Abstract

We propose a framework for two-player infinite-dimensional games with cooperative or competitive structure. These games take the form of coupled partial differential equations in which players optimize over a space of measures, driven by either a gradient descent or gradient descent-ascent in Wasserstein-2 space. We characterize the properties of the Nash equilibrium of the system, and relate it to the steady state of the dynamics. In the min-max setting, we show, under sufficient convexity conditions, that solutions converge exponentially fast and with explicit rate to the unique Nash equilibrium. Similar results are obtained for the cooperative setting. We apply this framework to distribution shift induced by interactions among a strategic population of agents and an algorithm, proving additional convergence results in the timescale-separated setting. We illustrate the performance of our model on (i) real data from an economics study on Colombia census data, (ii) feature modification in loan applications, and (iii) performative prediction. The numerical experiments demonstrate the importance of distribution-level, rather than moment-level, modeling.

Paper Structure

This paper contains 30 sections, 43 theorems, 283 equations, 4 figures.

Key Result

Theorem 3.3

Suppose that Assumptions assump:f_lower(i), assump:V_lower, and assump:W_lower are satisfied with Consider solutions $\gamma_t:=(\rho_t,\mu_t)$ to the dynamics eq:dynamics_aligned with initial condition satisfying $\gamma_0\in\mathcal{P}_2(\mathbb R^{d_1})\times \mathcal{P}_2(\mathbb R^{d_2})$, $F_a(\gamma_0)<\infty$, and Then the following hold:

Figures (4)

  • Figure 1: After the criteria for government aid was released in 1997, local officials misreported income data to increase the number of constituents qualifying for aid. The PDE \ref{['eq:dynamics_competitive']} is able to capture the sharp drop at the classifier threshold. The convergence rate for the loss of the population and algorithm are $0.00995$ and $0.0102$; the convergence rate for $\mathscr{W}_2(\rho_t,\rho^{(98)})$, where $\rho^{(98)}$ is the steady state distribution, is also $0.0114$ which is similar to the expected rate of $0.01$. The expected rate is computed using convexity properties of the KL term.
  • Figure 2: While the accuracy of the classifier is similar under both interaction models, the precision differs; this indicates that understanding the intra-agent interactions is important for understanding how errors impact subpopulations, in this case, those agents with algorithm label "qualified."
  • Figure 3: Agent densities are split based on their true label (eligible vs ineligible). The color denotes the algorithm output regarding loan qualification (qualified vs unqualified). The repulsive kernel causes the agents to spread apart along attributes 1 (normalized age) and 2 (normalized past due), while the attractive-repulsive kernel causes more swarm-like behavior. This is not evident from classifier performance only, indicating the importance of understanding the population dynamics explicitly.
  • Figure 4: A gradient descent approach outperforms a state-of-the-art technique of learning and using a linear mapping between the classifier parameters and the mean of the strategic distribution, illustrating the importance of having more detailed population models.

Theorems & Definitions (101)

  • Definition 2.1: Weak Convergence
  • Definition 2.2: Joint Wasserstein Metric
  • Definition 2.3: Displacement Convexity mccann_convexity_1997
  • Remark 2.4
  • Definition 2.5: Relative Energy
  • Definition 2.6: Steady states
  • Definition 2.7: Nash Equilibrium
  • Remark 3.1: Cauchy-Problem
  • Remark 3.2
  • Theorem 3.3
  • ...and 91 more