Environment Complexity and Nash Equilibria in a Sequential Social Dilemma
Mustafa Yasir, Andrew Howes, Vasilios Mavroudis, Chris Hicks
TL;DR
This paper investigates how environment complexity in higher-dimensional sequential social dilemmas affects cooperative outcomes in multi-agent reinforcement learning. By adapting the gridworld Stag Hunt to eight complexity variants and analyzing independent PPO learners, the authors show that greater complexity biases convergence toward suboptimal, risk-dominant Nash equilibria, even when higher-reward strategies exist. Through curriculum experiments and empirical game-theoretic analysis, they demonstrate that some Group B environments can map to MGSD/SSD structures and that suboptimal equilibria are robust to learning dynamics, though more cooperative strategies can be learned under guided training. The work highlights the critical role of environment dynamics in shaping general-sum MARL outcomes and provides a framework for linking complex RL environments to classic game-theoretic analyses.
Abstract
Multi-agent reinforcement learning (MARL) methods, while effective in zero-sum or positive-sum games, often yield suboptimal outcomes in general-sum games where cooperation is essential for achieving globally optimal outcomes. Matrix game social dilemmas, which abstract key aspects of general-sum interactions, such as cooperation, risk, and trust, fail to model the temporal and spatial dynamics characteristic of real-world scenarios. In response, our study extends matrix game social dilemmas into more complex, higher-dimensional MARL environments. We adapt a gridworld implementation of the Stag Hunt dilemma to more closely match the decision-space of a one-shot matrix game while also introducing variable environment complexity. Our findings indicate that as complexity increases, MARL agents trained in these environments converge to suboptimal strategies, consistent with the risk-dominant Nash equilibria strategies found in matrix games. Our work highlights the impact of environment complexity on achieving optimal outcomes in higher-dimensional game-theoretic MARL environments.
