Table of Contents
Fetching ...

The Lagrangian Method for Solving Constrained Markov Games

Soham Das, Santiago Paternain, Luiz F. O. Chamon, Ceyhun Eksin

TL;DR

The paper addresses constrained CMGs where agents optimize time-average rewards under joint cost constraints. It introduces a Lagrangian game and a primal-dual epoch-based algorithm that solves unconstrained Lagrangian games with a dual update, proving that the induced policy sequence forms a nonstationary constrained Nash equilibrium for the original CMG while ensuring feasibility almost surely. Under Slater conditions and unbiased rollouts, the method yields an $\"epsilon\-NE\" with $\\epsilon = \\eta B^2 / 2$, linking dual dynamics to performance guarantees. Practically, the framework enables leveraging existing unconstrained CMG solvers to handle constraints, broadening the applicability of safe MARL across model-based and reinforcement-learning approaches.

Abstract

We propose the concept of a Lagrangian game to solve constrained Markov games. Such games model scenarios where agents face cost constraints in addition to their individual rewards, that depend on both agent joint actions and the evolving environment state over time. Constrained Markov games form the formal mechanism behind safe multiagent reinforcement learning, providing a structured model for dynamic multiagent interactions in a multitude of settings, such as autonomous teams operating under local energy and time constraints, for example. We develop a primal-dual approach in which agents solve a Lagrangian game associated with the current Lagrange multiplier, simulate cost and reward trajectories over a fixed horizon, and update the multiplier using accrued experience. This update rule generates a new Lagrangian game, initiating the next iteration. Our key result consists in showing that the sequence of solutions to these Lagrangian games yields a nonstationary Nash solution for the original constrained Markov game.

The Lagrangian Method for Solving Constrained Markov Games

TL;DR

The paper addresses constrained CMGs where agents optimize time-average rewards under joint cost constraints. It introduces a Lagrangian game and a primal-dual epoch-based algorithm that solves unconstrained Lagrangian games with a dual update, proving that the induced policy sequence forms a nonstationary constrained Nash equilibrium for the original CMG while ensuring feasibility almost surely. Under Slater conditions and unbiased rollouts, the method yields an \\epsilon = \\eta B^2 / 2$, linking dual dynamics to performance guarantees. Practically, the framework enables leveraging existing unconstrained CMG solvers to handle constraints, broadening the applicability of safe MARL across model-based and reinforcement-learning approaches.

Abstract

We propose the concept of a Lagrangian game to solve constrained Markov games. Such games model scenarios where agents face cost constraints in addition to their individual rewards, that depend on both agent joint actions and the evolving environment state over time. Constrained Markov games form the formal mechanism behind safe multiagent reinforcement learning, providing a structured model for dynamic multiagent interactions in a multitude of settings, such as autonomous teams operating under local energy and time constraints, for example. We develop a primal-dual approach in which agents solve a Lagrangian game associated with the current Lagrange multiplier, simulate cost and reward trajectories over a fixed horizon, and update the multiplier using accrued experience. This update rule generates a new Lagrangian game, initiating the next iteration. Our key result consists in showing that the sequence of solutions to these Lagrangian games yields a nonstationary Nash solution for the original constrained Markov game.

Paper Structure

This paper contains 17 sections, 7 theorems, 55 equations, 6 figures, 2 algorithms.

Key Result

Theorem 1

Under the above assumptions, the state action sequences $(s^t,a^t)$ generated by Algorithm alg:dualdes,alg:gamedyn are feasible with probability $1$, i.e. Moreover, the sequence of policies forms a nonstationary $(\eta B^{2}/2)$-NE for the constrained Markov game ${\mathcal{G}}$.

Figures (6)

  • Figure 1: Agents can maximize payoffs by cooperating on hunting stag, but are required to spend a fraction of their hunting time at the resting station.
  • Figure 2: The geometry of the grid for the Stag-Hare-Resting Station game. The position of the players represent the initial state of the game for the trajectory simulations.
  • Figure 3: The evolution of the Lagrange multipliers across $K=200$ epochs for multiple episodes sampled from random initial states, with resting threshold levels 0.25 (blue), 0.5 (orange) and 0.75 (green). The multipliers satisfy tightness in Lemma \ref{['lem_tightness']}.
  • Figure 4: Constraint satisfaction by the simulated episodes as the resting station rewards (total time spent at resting station by both the agents) collapses on the resting time threshold (threshold values 0.25, 0.50 and 0.75 in blue, orange and green respectively). Episodes simulated from random initial states. Number of epochs $K=200$, size of epoch $T_0=100$.
  • Figure 5: Time average cumulative value episodes converge for agents, constituting an $\epsilon$-NE. The blue, orange and green curves represent resting thresholds of 0.25, 0.50 and 0.75 respectively. Episodes simulated from random initial states. Number of epochs $K=200$, size of epoch $T_0=100$.
  • ...and 1 more figures

Theorems & Definitions (19)

  • Definition 1
  • Definition 2
  • Definition 3
  • Theorem 1
  • Remark 1
  • Definition 4
  • Definition 5
  • Lemma 1
  • proof
  • Lemma 2
  • ...and 9 more