Governing Strategic Dynamics: Equilibrium Stabilization via Divergence-Driven Control
Hao Shi, Xi Li, Fangfang Xie
TL;DR
The paper tackles non-stationarity in coevolution caused by opponent drift, which distorts progress signals and fosters cycling. It introduces the Marker Gene Method (MGM) to stabilize cross-generational evaluation by anchoring against a persistent marker and using Dynamic Weight Adjustment, plus a conservative marker-update scheme; it also adds NGD-Div to adapt the DWAM threshold online. The authors provide theoretical stability results in symmetric strictly competitive games and validate MGM-E-NES across Rock-Paper-Scissors variants, coordination games, and a Markov resource game, showing improved stability and welfare-aligned behavior with minimal hyperparameter tuning. The framework demonstrates robust cross-task transfer, offering a practical governance approach to enhancing the reliability of black-box coevolution in dynamic environments. The work suggests future extensions to broader Markov games, improved sample efficiency, and integration with other optimizers while preserving the governance principles.
Abstract
Black-box coevolution in mixed-motive games is often undermined by opponent-drift non-stationarity and noisy rollouts, which distort progress signals and can induce cycling, Red-Queen dynamics, and detachment. We propose the \emph{Marker Gene Method} (MGM), a curriculum-inspired governance mechanism that stabilizes selection by anchoring evaluation to cross-generational marker individuals, together with DWAM and conservative marker-update rules to reduce spurious updates. We also introduce NGD-Div, which adapts the key update threshold using a divergence proxy and natural-gradient optimization. We provide theoretical analysis in strictly competitive settings and evaluate MGM integrated with evolution strategies (MGM-E-NES) on coordination games and a resource-depletion Markov game. MGM-E-NES reliably recovers target coordination in Stag Hunt and Battle of the Sexes, achieving final cooperation probabilities close to $(1,1)$ (e.g., $0.991\pm0.01/1.00\pm0.00$ and $0.97\pm0.00/0.97\pm0.00$ for the two players). In the Markov resource game, it maintains high and stable state-conditioned cooperation across 30 seeds, with final cooperation of $\approx 0.954/0.980/0.916$ in \textsc{Rich}/\textsc{Poor}/\textsc{Collapsed} (both players; small standard deviations), indicating welfare-aligned and state-dependent behavior. Overall, MGM-E-NES transfers across tasks with minimal hyperparameter changes and yields consistently stable training dynamics, showing that top-level governance can substantially improve the robustness of black-box coevolution in dynamic environments.
