Table of Contents
Fetching ...

Governing Strategic Dynamics: Equilibrium Stabilization via Divergence-Driven Control

Hao Shi, Xi Li, Fangfang Xie

TL;DR

The paper tackles non-stationarity in coevolution caused by opponent drift, which distorts progress signals and fosters cycling. It introduces the Marker Gene Method (MGM) to stabilize cross-generational evaluation by anchoring against a persistent marker and using Dynamic Weight Adjustment, plus a conservative marker-update scheme; it also adds NGD-Div to adapt the DWAM threshold online. The authors provide theoretical stability results in symmetric strictly competitive games and validate MGM-E-NES across Rock-Paper-Scissors variants, coordination games, and a Markov resource game, showing improved stability and welfare-aligned behavior with minimal hyperparameter tuning. The framework demonstrates robust cross-task transfer, offering a practical governance approach to enhancing the reliability of black-box coevolution in dynamic environments. The work suggests future extensions to broader Markov games, improved sample efficiency, and integration with other optimizers while preserving the governance principles.

Abstract

Black-box coevolution in mixed-motive games is often undermined by opponent-drift non-stationarity and noisy rollouts, which distort progress signals and can induce cycling, Red-Queen dynamics, and detachment. We propose the \emph{Marker Gene Method} (MGM), a curriculum-inspired governance mechanism that stabilizes selection by anchoring evaluation to cross-generational marker individuals, together with DWAM and conservative marker-update rules to reduce spurious updates. We also introduce NGD-Div, which adapts the key update threshold using a divergence proxy and natural-gradient optimization. We provide theoretical analysis in strictly competitive settings and evaluate MGM integrated with evolution strategies (MGM-E-NES) on coordination games and a resource-depletion Markov game. MGM-E-NES reliably recovers target coordination in Stag Hunt and Battle of the Sexes, achieving final cooperation probabilities close to $(1,1)$ (e.g., $0.991\pm0.01/1.00\pm0.00$ and $0.97\pm0.00/0.97\pm0.00$ for the two players). In the Markov resource game, it maintains high and stable state-conditioned cooperation across 30 seeds, with final cooperation of $\approx 0.954/0.980/0.916$ in \textsc{Rich}/\textsc{Poor}/\textsc{Collapsed} (both players; small standard deviations), indicating welfare-aligned and state-dependent behavior. Overall, MGM-E-NES transfers across tasks with minimal hyperparameter changes and yields consistently stable training dynamics, showing that top-level governance can substantially improve the robustness of black-box coevolution in dynamic environments.

Governing Strategic Dynamics: Equilibrium Stabilization via Divergence-Driven Control

TL;DR

The paper tackles non-stationarity in coevolution caused by opponent drift, which distorts progress signals and fosters cycling. It introduces the Marker Gene Method (MGM) to stabilize cross-generational evaluation by anchoring against a persistent marker and using Dynamic Weight Adjustment, plus a conservative marker-update scheme; it also adds NGD-Div to adapt the DWAM threshold online. The authors provide theoretical stability results in symmetric strictly competitive games and validate MGM-E-NES across Rock-Paper-Scissors variants, coordination games, and a Markov resource game, showing improved stability and welfare-aligned behavior with minimal hyperparameter tuning. The framework demonstrates robust cross-task transfer, offering a practical governance approach to enhancing the reliability of black-box coevolution in dynamic environments. The work suggests future extensions to broader Markov games, improved sample efficiency, and integration with other optimizers while preserving the governance principles.

Abstract

Black-box coevolution in mixed-motive games is often undermined by opponent-drift non-stationarity and noisy rollouts, which distort progress signals and can induce cycling, Red-Queen dynamics, and detachment. We propose the \emph{Marker Gene Method} (MGM), a curriculum-inspired governance mechanism that stabilizes selection by anchoring evaluation to cross-generational marker individuals, together with DWAM and conservative marker-update rules to reduce spurious updates. We also introduce NGD-Div, which adapts the key update threshold using a divergence proxy and natural-gradient optimization. We provide theoretical analysis in strictly competitive settings and evaluate MGM integrated with evolution strategies (MGM-E-NES) on coordination games and a resource-depletion Markov game. MGM-E-NES reliably recovers target coordination in Stag Hunt and Battle of the Sexes, achieving final cooperation probabilities close to (e.g., and for the two players). In the Markov resource game, it maintains high and stable state-conditioned cooperation across 30 seeds, with final cooperation of in \textsc{Rich}/\textsc{Poor}/\textsc{Collapsed} (both players; small standard deviations), indicating welfare-aligned and state-dependent behavior. Overall, MGM-E-NES transfers across tasks with minimal hyperparameter changes and yields consistently stable training dynamics, showing that top-level governance can substantially improve the robustness of black-box coevolution in dynamic environments.

Paper Structure

This paper contains 80 sections, 39 equations, 11 figures, 3 tables, 4 algorithms.

Figures (11)

  • Figure 1: This figure provides a local visualization of the Dynamic Weight Adjustment Mechanism (DWAM) in the vicinity of the threshold. Specifically, the 3D surface shows the resulting comprehensive fitness $\widehat{F}_i^{(t)}$ when the marker-based score $\widehat{B}_i^{(t)}$ and the generalization score $\widehat{G}_i^{(t)}$ vary within $[0.9,1.0]$, while the threshold is fixed at $l=0.9$ and the scale parameter is fixed at $s=10^{2}$.
  • Figure 2: Convergence to the Nash equilibrium (NE) in RPS, measured by $\log_{10}$ KL divergence between the mean population strategy and the NE mixed strategy. Lower is better. Shaded bands indicate the interquartile range (25th--75th percentiles) and the empirical 5th--95th percentile interval across runs.
  • Figure 3: MGM dynamics in RPS over 1000 generations: (a) average fitness for Pop1 and Pop2; (b) average strategy probabilities for Pop1 and Pop2 (dashed line at $1/3$); (c) marker gene strategy probabilities for Pop1; and (d) marker gene strategy probabilities for Pop2.
  • Figure 4: Effect of population size on convergence in the RPS game. The curves show the mean KL divergence from the population's average strategy to the NE over 10 independent runs for $N=100$ and $N=50$. Shaded bands indicate the interquartile range (25th--75th percentiles) and the empirical 5th--95th percentile interval, reflecting stability and run-to-run variability.
  • Figure 5: Convergence trajectory against baselines.
  • ...and 6 more figures