Table of Contents
Fetching ...

On Feasible Rewards in Multi-Agent Inverse Reinforcement Learning

Till Freihaut, Giorgia Ramponi

TL;DR

This work tackles MAIRL by highlighting that observing a single Nash equilibrium is insufficient to identify underlying rewards due to equilibrium multiplicity. It introduces entropy-regularized Markov games to obtain a unique equilibrium (QRE) and develops a theoretical framework for MAIRL in this regime, including an error-propagation analysis and a generative-model-based sample complexity bound. The authors prove that, in general, reward identifiability is only achievable in the average-reward sense, but show that it becomes possible under the assumption of linearly separable rewards. The results provide a rigorous foundation for MAIRL, offering concrete conditions, bounds, and directions for designing algorithms and studying identifiability under practical multi-agent settings.

Abstract

Multi-agent Inverse Reinforcement Learning (MAIRL) aims to recover agent reward functions from expert demonstrations. We characterize the feasible reward set in Markov games, identifying all reward functions that rationalize a given equilibrium. However, equilibrium-based observations are often ambiguous: a single Nash equilibrium can correspond to many reward structures, potentially changing the game's nature in multi-agent systems. We address this by introducing entropy-regularized Markov games, which yield a unique equilibrium while preserving strategic incentives. For this setting, we provide a sample complexity analysis detailing how errors affect learned policy performance. Our work establishes theoretical foundations and practical insights for MAIRL.

On Feasible Rewards in Multi-Agent Inverse Reinforcement Learning

TL;DR

This work tackles MAIRL by highlighting that observing a single Nash equilibrium is insufficient to identify underlying rewards due to equilibrium multiplicity. It introduces entropy-regularized Markov games to obtain a unique equilibrium (QRE) and develops a theoretical framework for MAIRL in this regime, including an error-propagation analysis and a generative-model-based sample complexity bound. The authors prove that, in general, reward identifiability is only achievable in the average-reward sense, but show that it becomes possible under the assumption of linearly separable rewards. The results provide a rigorous foundation for MAIRL, offering concrete conditions, bounds, and directions for designing algorithms and studying identifiability under practical multi-agent settings.

Abstract

Multi-agent Inverse Reinforcement Learning (MAIRL) aims to recover agent reward functions from expert demonstrations. We characterize the feasible reward set in Markov games, identifying all reward functions that rationalize a given equilibrium. However, equilibrium-based observations are often ambiguous: a single Nash equilibrium can correspond to many reward structures, potentially changing the game's nature in multi-agent systems. We address this by introducing entropy-regularized Markov games, which yield a unique equilibrium while preserving strategic incentives. For this setting, we provide a sample complexity analysis detailing how errors affect learned policy performance. Our work establishes theoretical foundations and practical insights for MAIRL.

Paper Structure

This paper contains 33 sections, 35 theorems, 171 equations, 5 figures, 1 algorithm.

Key Result

Proposition 3.4

Let us consider any MAIRL algorithm $\mathrm{Alg}_{\mathrm{MAIRL}}$ that chooses $\hat{R} \in \mathcal{R}_{(\hat{{\mathcal{G}}}, \hat{{\boldsymbol{\pi}}}^{\mathrm{Nash}})}$ that is not a constant reward, i.e. $\hat{R} \neq C$ for $C \in [-R_{\max}, R_{\max}].$ Furthermore, consider a MARL algorithm

Figures (5)

  • Figure 1: Feasible Reward Sets of true sets of Nash equilibria and the recovered feasible reward sets for two different observed Nash equilibria.
  • Figure 2: Failure of single equilibrium observation.
  • Figure 3: Recovered rewards under QRE equilibrium observations.
  • Figure 4: Nash Gap in Grid Games for different transition probabilities.
  • Figure 5: Multi-agent grid world environments with different transition probabilities and learned NE path

Theorems & Definitions (69)

  • Definition 3.1
  • Definition 3.2: Nash Imitation Gap for MAIRL
  • Definition 3.3: Optimality Criterion
  • Proposition 3.4
  • Definition 3.5
  • Lemma 3.6
  • Theorem 3.7: Error propagation
  • Theorem 3.9
  • Theorem 4.1
  • Theorem 4.2: Sample Complexity for Induced Transitions
  • ...and 59 more