Efficiently Quantifying Individual Agent Importance in Cooperative MARL

Omayma Mahjoub; Ruan de Kock; Siddarth Singh; Wiem Khlifi; Abidine Vall; Kale-ab Tessera; Arnu Pretorius

Efficiently Quantifying Individual Agent Importance in Cooperative MARL

Omayma Mahjoub, Ruan de Kock, Siddarth Singh, Wiem Khlifi, Abidine Vall, Kale-ab Tessera, Arnu Pretorius

TL;DR

The paper tackles credit attribution in cooperative MARL with a shared global reward by introducing Agent Importance, a scalable metric derived from difference rewards. Agent Importance computes per-timestep contributions as $\hat{S}^{AI}_{i}(\Gamma) = \frac{1}{T} \sum_{t=1}^{T} (r^t - r^t_{-i})$, achieving linear time complexity in the number of agents and correlating with true Shapley values and ground-truth rewards. Through a case study reanalyzing a prior MARL benchmark on LBF and RWARE using SMAClite, the authors demonstrate how Agent Importance can diagnose coordination failures, compare algorithmic behaviours, and reveal the impact of parameter sharing and heterogeneity on agent contributions. They discuss limitations and propose future directions for applying the metric in environments lacking no-op actions or featuring more diverse agent roles, highlighting practical implications for scalable MARL benchmarking and explainability.

Abstract

Measuring the contribution of individual agents is challenging in cooperative multi-agent reinforcement learning (MARL). In cooperative MARL, team performance is typically inferred from a single shared global reward. Arguably, among the best current approaches to effectively measure individual agent contributions is to use Shapley values. However, calculating these values is expensive as the computational complexity grows exponentially with respect to the number of agents. In this paper, we adapt difference rewards into an efficient method for quantifying the contribution of individual agents, referred to as Agent Importance, offering a linear computational complexity relative to the number of agents. We show empirically that the computed values are strongly correlated with the true Shapley values, as well as the true underlying individual agent rewards, used as the ground truth in environments where these are available. We demonstrate how Agent Importance can be used to help study MARL systems by diagnosing algorithmic failures discovered in prior MARL benchmarking work. Our analysis illustrates Agent Importance as a valuable explainability component for future MARL benchmarks.

Efficiently Quantifying Individual Agent Importance in Cooperative MARL

TL;DR

, achieving linear time complexity in the number of agents and correlating with true Shapley values and ground-truth rewards. Through a case study reanalyzing a prior MARL benchmark on LBF and RWARE using SMAClite, the authors demonstrate how Agent Importance can diagnose coordination failures, compare algorithmic behaviours, and reveal the impact of parameter sharing and heterogeneity on agent contributions. They discuss limitations and propose future directions for applying the metric in environments lacking no-op actions or featuring more diverse agent roles, highlighting practical implications for scalable MARL benchmarking and explainability.

Abstract

Paper Structure (30 sections, 3 equations, 54 figures, 13 tables, 1 algorithm)

This paper contains 30 sections, 3 equations, 54 figures, 13 tables, 1 algorithm.

Introduction
Related Work
Agent Importance
Case Study: using Agent Importance to analyse a prior benchmark
Results
Validating Agent Importance
Applications of Agent Importance
Discussion
Experimental details
Environments
Level Based Foraging
Multi-Robot Warehouse
Algorithms Details
Q-learning
Policy Gradients (PG):
...and 15 more sections

Figures (54)

Figure 1: Left: Multi-Robot Warehouse (RWARE). Middle: Level-Based Foraging (LBF). Right: SMAClite
Figure 2: Correlation analysis for agents $\{a_0, a_1, a_2, a_3\}$, for each metric: Agent Importance $i$, Shapley Value $s$, and Individual Reward $r$ using the VDN algorithm.(a) Heatmap of Correlations among Metrics. TOP: LBF 15x15-4p-5f. BOTTOM: RWARE small-4ag. (b) Matching Rankings Comparison on LBF 15x15-4p-5f. (c) Matching Rankings Comparison on RWARE small-4ag. The legend refers to which metric is being compared to the individual agent rewards.
Figure 3: Computational cost of computing the agent importance and the Shapley value.
Figure 4: Agent importance scores on the deterministic LBF scenario for MAA2C, MAPPO, VDN and QMIX. Agents $0$, $1$ and $2$ are assigned fixed levels of $1$, $2$ and $3$ respectively--implying that their contributions should be weighted accordingly.
Figure 5: Algorithm performance on LBF and RWARE including probability of improvement, performance profiles and sample efficiency curves. Top two rows: Performance of algorithms on 7 LBF tasks. Bottom two rows: Performance of all algorithms on 3 RWARE tasks.
...and 49 more figures

Efficiently Quantifying Individual Agent Importance in Cooperative MARL

TL;DR

Abstract

Efficiently Quantifying Individual Agent Importance in Cooperative MARL

Authors

TL;DR

Abstract

Table of Contents

Figures (54)