The Lagrangian Method for Solving Constrained Markov Games
Soham Das, Santiago Paternain, Luiz F. O. Chamon, Ceyhun Eksin
TL;DR
The paper addresses constrained CMGs where agents optimize time-average rewards under joint cost constraints. It introduces a Lagrangian game and a primal-dual epoch-based algorithm that solves unconstrained Lagrangian games with a dual update, proving that the induced policy sequence forms a nonstationary constrained Nash equilibrium for the original CMG while ensuring feasibility almost surely. Under Slater conditions and unbiased rollouts, the method yields an $\"epsilon\-NE\" with $\\epsilon = \\eta B^2 / 2$, linking dual dynamics to performance guarantees. Practically, the framework enables leveraging existing unconstrained CMG solvers to handle constraints, broadening the applicability of safe MARL across model-based and reinforcement-learning approaches.
Abstract
We propose the concept of a Lagrangian game to solve constrained Markov games. Such games model scenarios where agents face cost constraints in addition to their individual rewards, that depend on both agent joint actions and the evolving environment state over time. Constrained Markov games form the formal mechanism behind safe multiagent reinforcement learning, providing a structured model for dynamic multiagent interactions in a multitude of settings, such as autonomous teams operating under local energy and time constraints, for example. We develop a primal-dual approach in which agents solve a Lagrangian game associated with the current Lagrange multiplier, simulate cost and reward trajectories over a fixed horizon, and update the multiplier using accrued experience. This update rule generates a new Lagrangian game, initiating the next iteration. Our key result consists in showing that the sequence of solutions to these Lagrangian games yields a nonstationary Nash solution for the original constrained Markov game.
