Addressing Rotational Learning Dynamics in Multi-Agent Reinforcement Learning
Baraah A. M. Sidahmed, Tatjana Chavdarova
TL;DR
This work identifies rotational learning dynamics as a core driver of instability and reproducibility problems in centralized training decentralized execution MARL. It reframes MARL as a Variational Inequality problem with operator $F$ and adopts VI-solvers, notably nested Lookahead VI (LA-VI) and Extragradient (EG), to stabilize joint policy and value updates. The authors introduce LA-MARL and (LA-)EG-MARL, provide convergence guarantees under monotone VI assumptions, and demonstrate improvements on zero-sum games and MPE benchmarks. The findings show VI-based optimization yields stronger convergence to equilibrium and better coordination, suggesting a practical and scalable path to more robust MARL systems.
Abstract
Multi-agent reinforcement learning (MARL) has emerged as a powerful paradigm for solving complex problems through agents' cooperation and competition, finding widespread applications across domains. Despite its success, MARL faces a reproducibility crisis. We show that, in part, this issue is related to the rotational optimization dynamics arising from competing agents' objectives, and require methods beyond standard optimization algorithms. We reframe MARL approaches using Variational Inequalities (VIs), offering a unified framework to address such issues. Leveraging optimization techniques designed for VIs, we propose a general approach for integrating gradient-based VI methods capable of handling rotational dynamics into existing MARL algorithms. Empirical results demonstrate significant performance improvements across benchmarks. In zero-sum games, Rock--paper--scissors and Matching pennies, VI methods achieve better convergence to equilibrium strategies, and in the Multi-Agent Particle Environment: Predator-prey, they also enhance team coordination. These results underscore the transformative potential of advanced optimization techniques in MARL.
