Mutation-Bias Learning in Games
Johann Bauer, Sheldon West, Eduardo Alonso, Mark Broom
TL;DR
This work addresses convergence in multi-agent reinforcement learning by formulating two mutation-bias learning variants, MBL-DPU and MBL-LC, that connect to mutation-perturbed replicator dynamics. The authors establish a direct link between the stochastic updates and the ODE system, proving convergence properties and highlighting how the mutation perturbation drives interior equilibria toward Nash equilibria in several settings. Compared to FAQ and WoLF-PHC, MBL-DPU offers stronger analytic guarantees and robustness to increasing dimensionality, while MBL-LC trades some reliability for faster convergence in simpler games. The results demonstrate the value of a dynamical-systems perspective for MARL, enabling transferability of insights and guiding parameter choices for convergence and generalization in practice.
Abstract
We present two variants of a multi-agent reinforcement learning algorithm based on evolutionary game theoretic considerations. The intentional simplicity of one variant enables us to prove results on its relationship to a system of ordinary differential equations of replicator-mutator dynamics type, allowing us to present proofs on the algorithm's convergence conditions in various settings via its ODE counterpart. The more complicated variant enables comparisons to Q-learning based algorithms. We compare both variants experimentally to WoLF-PHC and frequency-adjusted Q-learning on a range of settings, illustrating cases of increasing dimensionality where our variants preserve convergence in contrast to more complicated algorithms. The availability of analytic results provides a degree of transferability of results as compared to purely empirical case studies, illustrating the general utility of a dynamical systems perspective on multi-agent reinforcement learning when addressing questions of convergence and reliable generalisation.
