Table of Contents
Fetching ...

Disc Game Dynamics: A Latent Space Perspective on Selection and Learning in Games

Pablo Lechon-Alonso, Andrew Dennehy, Ruizheng Bai, Nicolas Sanchez, Derek K. Wise, David Sewell, David Rosenbluth, Alexander Strang

TL;DR

This work introduces disc-game embedding, a principled latent-space representation for symmetric, zero-sum two-player games, enabling a bilinear decomposition of generic payoff structures into disc-game components in a transformed coordinate system. The authors show that learning dynamics, notably the continuous replicator equation, reduce to finite-dimensional, Hamiltonian parameter dynamics in the latent space, with exact closure when the payoff rank is finite and exact equivalence to adaptive dynamics in transformed coordinates. The framework yields deep geometric insights: trajectories are recurrent and oscillatory (orbits about centers) unless the embedding’s convex hull excludes the origin, in which case boundary-driven, non-recurrent behavior emerges; metapopulation and frequency-dependent generalizations are also analyzed. Practically, the disc embedding provides a scalable, interpretable, and numerically efficient paradigm for studying learning and selection in symmetric two-player zero-sum interactions, with exact or optimal approximations in broad settings. Overall, the paper argues that disc-game embedding offers a unifying, dynamical alternative to static equilibrium thinking for these game-theoretic learning problems.

Abstract

Evolutionary game theory studies populations that change in response to an underlying game. Often, the functional form relating outcome to player attributes or strategy is complex, preventing mathematical progress. In this work, we axiomatically derive a latent space representation for pairwise, symmetric, zero-sum games by seeking a coordinate space in which the optimal training direction for an agent responding to an opponent depends only on their opponent's coordinates. The associated embedding represents the original game as a linear combination of copies of a simple game, the disc game, in a new coordinate space. In this article, we show that disc-game embedding is useful for studying learning dynamics. We demonstrate that a series of classical evolutionary processes simplify to constrained oscillator equations in the latent space. In particular, the continuous replicator equation reduces to a Hamiltonian system of coupled oscillators that exhibit Poincaré recurrence. This reduction allows exact, finite-dimensional closure when the underlying game is finite-rank, and optimal approximation otherwise. It also establishes an exact equivalence between the continuous replicator equation and adaptive dynamics in the transformed coordinates. By identifying a minimal rank representation, the disc game embedding offers numerical methods that could decouple the cost of simulation from the number of attributes used to define agents. These results generalize to metapopulation models that mix inhomogeneously, and to any time-differentiable dynamic where the rate of growth of a type, relative to its expected payout, is a nonnegative function of its frequency. We recommend disc-game embedding as an organizing paradigm for learning and selection in response to symmetric two-player zero-sum games.

Disc Game Dynamics: A Latent Space Perspective on Selection and Learning in Games

TL;DR

This work introduces disc-game embedding, a principled latent-space representation for symmetric, zero-sum two-player games, enabling a bilinear decomposition of generic payoff structures into disc-game components in a transformed coordinate system. The authors show that learning dynamics, notably the continuous replicator equation, reduce to finite-dimensional, Hamiltonian parameter dynamics in the latent space, with exact closure when the payoff rank is finite and exact equivalence to adaptive dynamics in transformed coordinates. The framework yields deep geometric insights: trajectories are recurrent and oscillatory (orbits about centers) unless the embedding’s convex hull excludes the origin, in which case boundary-driven, non-recurrent behavior emerges; metapopulation and frequency-dependent generalizations are also analyzed. Practically, the disc embedding provides a scalable, interpretable, and numerically efficient paradigm for studying learning and selection in symmetric two-player zero-sum interactions, with exact or optimal approximations in broad settings. Overall, the paper argues that disc-game embedding offers a unifying, dynamical alternative to static equilibrium thinking for these game-theoretic learning problems.

Abstract

Evolutionary game theory studies populations that change in response to an underlying game. Often, the functional form relating outcome to player attributes or strategy is complex, preventing mathematical progress. In this work, we axiomatically derive a latent space representation for pairwise, symmetric, zero-sum games by seeking a coordinate space in which the optimal training direction for an agent responding to an opponent depends only on their opponent's coordinates. The associated embedding represents the original game as a linear combination of copies of a simple game, the disc game, in a new coordinate space. In this article, we show that disc-game embedding is useful for studying learning dynamics. We demonstrate that a series of classical evolutionary processes simplify to constrained oscillator equations in the latent space. In particular, the continuous replicator equation reduces to a Hamiltonian system of coupled oscillators that exhibit Poincaré recurrence. This reduction allows exact, finite-dimensional closure when the underlying game is finite-rank, and optimal approximation otherwise. It also establishes an exact equivalence between the continuous replicator equation and adaptive dynamics in the transformed coordinates. By identifying a minimal rank representation, the disc game embedding offers numerical methods that could decouple the cost of simulation from the number of attributes used to define agents. These results generalize to metapopulation models that mix inhomogeneously, and to any time-differentiable dynamic where the rate of growth of a type, relative to its expected payout, is a nonnegative function of its frequency. We recommend disc-game embedding as an organizing paradigm for learning and selection in response to symmetric two-player zero-sum games.

Paper Structure

This paper contains 57 sections, 156 equations, 15 figures.

Figures (15)

  • Figure 1: Payout matrices for an population implementation of the iterated prisoner's dilemma (See Appendix Section \ref{['app: example game']}). The color of the $i,j$ entry represents $f(x(i),x(j))$ for a population of 800 randomly drawn agents with attributes $\{x(i) \}_{i=1}^{800}$. The matrices represent the same data, but with the population ordered according to different attributes. Note that, while performance could be roughly approximated as a simple function of the attributes $p_*$ and $\gamma$, in both cases an exact description is difficult. For example, the left panel shows that agents with large $p_*$ (rows near the bottom), tend to lose to agents with a small $p_*$ (columns near the left. However, this game is not transitively ordered by $p_*$ alone, as there exist upsets against this prediction. The upper left half is mostly, but not entirely yellow, and the bottom half is mostly, but not entirely, blue. The upsets correspond to particular choices of the pair $(p_*,\gamma)$ that reverse the general trend in $p_*$. Even in this two-parameter case, it is not immediately apparent how to extract a simple strategic description from this data, despite the apparent structure.
  • Figure 2: Disc game geometries. In all panels, the two coordinates correspond to $y_1$ and $y_2$, and the circulating grey vector field represents the optimal training response to an agent at each possible location in the disc-game. This is the optimal self-play vector field, $v(y,y)$. It represents the direction in which agents should move when training against themselves.
  • Figure 3: Disc game embedding for a two-dimensional trait space (top row) and a one-dimensional trait space (bottom row). The left-hand column shows the original trait spaces, $\Omega$ (magenta region), and a pair of agents in each, with attribute vectors $x$ and $x'$ (marked with a square and a triangle). Competitive advantage is evaluated using $f$ in the original space. In the original attribute space, $f$ may be arbitrarily complicated. The arrow represents the transformation $T$ which maps $\Omega$ to $\Psi$. The right-hand column represents the disc game embedding. The coordinates, $y$ are broken into consecutive pairs, each composing a separate game. The image of the original attribute space under the embedding, $\Psi$, is shown in magenta. The image of $x$ and $x'$, $y = T(x)$ and $y' = T(x')$ are shown with matching markers. In the embedding space, the competitive advantage between $y$ and $y'$ is simple, and is determined by a disc game, or, equivalently, cross-product between the embedding coordinates. The value of the cross-product equals the value of the line integral from $y'$ to $y$ against the optimal local training vector field, $v$ (shown in grey in each disc game). Note: although we motivated the disc game embedding assuming smooth $T$, disc game embeddings may use nondifferentiable and discontinuous $T$. The regularity of $T$ is constrained by the regularity of $f$ (see Section \ref{['sec: regularity']}).
  • Figure 4: The disc game representation of the IPD payout matrices presented in Figure \ref{['fig: IPD Payout Matrices']}. For the full game specification, see Appendix Seection \ref{['app: example game']}. All three panels plot the coordinates, $[y^{(1)}_1(x),y^{(1)}_2(x)]$ for each agent of type $x$ in the sampled population. The first 800 agents are bolded. Small scatter points correspond to an interpolation of the embedded 800 agents using 3,000 agents. The interpolation is added to smooth the visual trends, and to highlight the apparently simple underlying functional relation mapping from $x$ to $y$. Here $T$ is only solved for pointwise. Only the first disc game is shown since it accounts for 90% of the total variance in performance. Left: Agents are colored according to their average performance, in the first disc game, against the 800 agent population. Since the disc game is bilinear, the average payout is determined by a cross-product against the centroid in the disc game space, marked with a black diamond and denoted $\bar{y}$. Agents with embedded attributes $y_*$, marked with a black square, performs neutrally in the disc game. They correspond to a Nash Equilibrium policy as their disc-game payout against any opponent is zero, and their opponent's payout is also zero. The remaining marked agents correspond to agents with distinct behaviors. For an interprative key, see Box 1. Middle: Agents are colored according to their innate preference for cooperation, $p_*$. The value of $p_*$ sets the probability with which they cooperate before interacting with an opponent, and is the policy they would return to in absence of information about their opponent. Notice that phase around the origin is a smooth function of $p_*$, with two fan-shaped lobes. Within each lobe, agents with smaller $p_*$ posses an advantage over agents with a larger $p_*$. That is, within each lobe, it is better to be distrustful. Right: Agents are colored by an attribute, $\gamma$, which controls how they react to their opponent's actions. Agents with large positive $\gamma$ imitate their opponent, agents with $\gamma$ near zero mostly don't respond to their opponent, and agents with large negative $\gamma$ play the opposite action their opponent last played. Again, note that position in the disc game is a smooth function of the attribute $\gamma$. In particular, the horizontal coordinate $y_1$ is approximately monotonic in $\gamma$. The two fans correspond to $\gamma < 0.4$ (right fan), and $\gamma > 0.4$ (left fan).
  • Figure 5: The invariant subspace $\mathcal{U}(\nu,u_0)$. The blue shaded parallelogram represents the subspace. The range of $F_S$ is represented by the pair of vectors spanning the subspace. The null space of $F$ given the uniform measure on $S$ is represented by the vertical direction perpendicular to the subspace. The oval marked $\text{Uni}(S)$ represents the constant function on $S$. The vector labeled $z_0$ is the null vector fixing the offset of $\mathcal{U}$. Changing the offset shifts the subspace along a direction contained in the null space. The vector labeled $u_0$ points to an initial log-density, $u(0)$. The concentric ovals represent sample orbits of the log density over time, $u(t)$. These remain within the invariant subspace. They orbit an equilibrium $u_*$ corresponding to a Nash equilibrium that is fully mixed over $S$. The offset fixing the position of the subspace can be specified by $u_0$, $x_0$, or $u_*$ (when it exists). Distributions with support contained in, but not equal to $S$ correspond to infinite limits where $u$ diverges. The dashed grey lines represent the coordinates imposed by the choice of reference measure $\nu$, and the ensuing embedding functions $y_{\nu}$. Changing reference measure amounts to changing this basis via an invertible linear transformation, as illustrated in the two small panels to the right of the main panel.
  • ...and 10 more figures