Table of Contents
Fetching ...

Distributionally Robust Markov Games with Average Reward

Zachary Roch, Yue Wang

TL;DR

This work addresses decision-making in multi-agent environments under model uncertainty with an average-reward criterion. It develops a rigorous theory for existence of stationary robust Nash equilibria by linking best-response policies to induced single-agent robust MDPs via solvable robust Bellman equations, under irreducible and weakly communicating structures. Two convergent algorithms are proposed: Robust Nash Iteration (with oracle-based convergence guarantees) and a TD-based proximal-descent method (oracle-free), providing practical pathways to robust equilibria. Finally, the paper connects average-reward DR-NE to discounted DR-NE, enabling approximation via large-discount-factor regimes and offering computational leverage for challenging long-horizon multi-agent problems.

Abstract

We study distributionally robust Markov games (DR-MGs) with the average-reward criterion, a framework for multi-agent decision-making under uncertainty over extended horizons. In average reward DR-MGs, agents aim to maximize their worst-case infinite-horizon average reward, to ensure satisfactory performance under environment uncertainties and opponent actions. We first establish a connection between the best-response policies and the optimal policies for the induced single-agent problems. Under a standard irreducible assumption, we derive a correspondence between the optimal policies and the solutions of the robust Bellman equation, and derive the existence of stationary Nash Equilibrium (NE) based on these results. We further study DR-MGs under the weakly communicating setting, where we construct a set-valued map and show its value is a subset of the best-response policies, convex and upper hemi-continuous, and derive the existence of NE. We then explore algorithmic solutions, by first proposing a Robust Nash-Iteration algorithm and providing convergence guarantees under some additional assumptions and a NE computing oracle. We further develop a temporal-difference based algorithm for DR-MGs, and provide convergence guarantees without any additional oracle or assumptions. Finally, we connect average-reward robust NE to discounted ones, showing that the average reward robust NE can be approximated by the discounted ones under a large discount factor. Our studies provide a comprehensive theoretical and algorithmic foundation for decision-making in complex, uncertain, and long-running multi-player environments.

Distributionally Robust Markov Games with Average Reward

TL;DR

This work addresses decision-making in multi-agent environments under model uncertainty with an average-reward criterion. It develops a rigorous theory for existence of stationary robust Nash equilibria by linking best-response policies to induced single-agent robust MDPs via solvable robust Bellman equations, under irreducible and weakly communicating structures. Two convergent algorithms are proposed: Robust Nash Iteration (with oracle-based convergence guarantees) and a TD-based proximal-descent method (oracle-free), providing practical pathways to robust equilibria. Finally, the paper connects average-reward DR-NE to discounted DR-NE, enabling approximation via large-discount-factor regimes and offering computational leverage for challenging long-horizon multi-agent problems.

Abstract

We study distributionally robust Markov games (DR-MGs) with the average-reward criterion, a framework for multi-agent decision-making under uncertainty over extended horizons. In average reward DR-MGs, agents aim to maximize their worst-case infinite-horizon average reward, to ensure satisfactory performance under environment uncertainties and opponent actions. We first establish a connection between the best-response policies and the optimal policies for the induced single-agent problems. Under a standard irreducible assumption, we derive a correspondence between the optimal policies and the solutions of the robust Bellman equation, and derive the existence of stationary Nash Equilibrium (NE) based on these results. We further study DR-MGs under the weakly communicating setting, where we construct a set-valued map and show its value is a subset of the best-response policies, convex and upper hemi-continuous, and derive the existence of NE. We then explore algorithmic solutions, by first proposing a Robust Nash-Iteration algorithm and providing convergence guarantees under some additional assumptions and a NE computing oracle. We further develop a temporal-difference based algorithm for DR-MGs, and provide convergence guarantees without any additional oracle or assumptions. Finally, we connect average-reward robust NE to discounted ones, showing that the average reward robust NE can be approximated by the discounted ones under a large discount factor. Our studies provide a comprehensive theoretical and algorithmic foundation for decision-making in complex, uncertain, and long-running multi-player environments.

Paper Structure

This paper contains 40 sections, 59 theorems, 303 equations, 2 algorithms.

Key Result

Lemma 3.1

There exists a DR-MG without any stationary robust NE under average reward.

Theorems & Definitions (107)

  • Lemma 3.1
  • Remark 1
  • Definition 1
  • Lemma 4.1
  • Lemma 4.2: Kakutani's Fixed Point Theorem kakutani1941generalization
  • Lemma 4.3
  • Definition 2: Induced Distributionally Robust MDP
  • Theorem 4.1
  • Theorem 4.2
  • Theorem 4.3
  • ...and 97 more