Table of Contents
Fetching ...

Fault Tolerant Multi-Agent Learning with Adversarial Budget Constraints

David Mguni, Yaqi Sun, Haojun Chen, Wanrong Yang, Amir Darabi, Larry Olanrewaju Orimoloye, Yaodong Yang

TL;DR

The results establish MARTA as a theoretically grounded and practically deployable mechanism for fault-tolerant MARL, a plug-and-play robustness layer that augments standard MARL algorithms with a Switcher-Adversary mechanism which selectively induces malfunctions in performance-critical states.

Abstract

We study robustness to agent malfunctions in cooperative multi-agent reinforcement learning (MARL), a failure mode that is critical in practice yet underexplored in existing theory. We introduce MARTA, a plug-and-play robustness layer that augments standard MARL algorithms with a Switcher-Adversary mechanism which selectively induces malfunctions in performance-critical states. This formulation defines a fault-switching $(N+2)$-player Markov game in which the Switcher chooses when and which agent fails, and the Adversary controls the resulting faulty behaviour via random or worst-case policies. We develop a Q-learning-type scheme and show that the associated Bellman operator is a contraction, yielding existence and uniqueness of the minimax value, convergence to a Markov perfect equilibrium. MARTA integrates seamlessly with MARL algorithms without architectural modification and consistently improves robustness across Traffic Junction (TJ), Level-Based Foraging (LBF), MPE SimpleTag, and SMAC (v2). In these domains, MARTA achieves large gains in final performance of up to 116.7\% in SMAC, 21.4\% in MPE SimpleTag, and 44.6\% in LBF, while significantly reducing failure rates under train-test mismatched fault regimes. These results establish MARTA as a theoretically grounded and practically deployable mechanism for fault-tolerant MARL.

Fault Tolerant Multi-Agent Learning with Adversarial Budget Constraints

TL;DR

The results establish MARTA as a theoretically grounded and practically deployable mechanism for fault-tolerant MARL, a plug-and-play robustness layer that augments standard MARL algorithms with a Switcher-Adversary mechanism which selectively induces malfunctions in performance-critical states.

Abstract

We study robustness to agent malfunctions in cooperative multi-agent reinforcement learning (MARL), a failure mode that is critical in practice yet underexplored in existing theory. We introduce MARTA, a plug-and-play robustness layer that augments standard MARL algorithms with a Switcher-Adversary mechanism which selectively induces malfunctions in performance-critical states. This formulation defines a fault-switching -player Markov game in which the Switcher chooses when and which agent fails, and the Adversary controls the resulting faulty behaviour via random or worst-case policies. We develop a Q-learning-type scheme and show that the associated Bellman operator is a contraction, yielding existence and uniqueness of the minimax value, convergence to a Markov perfect equilibrium. MARTA integrates seamlessly with MARL algorithms without architectural modification and consistently improves robustness across Traffic Junction (TJ), Level-Based Foraging (LBF), MPE SimpleTag, and SMAC (v2). In these domains, MARTA achieves large gains in final performance of up to 116.7\% in SMAC, 21.4\% in MPE SimpleTag, and 44.6\% in LBF, while significantly reducing failure rates under train-test mismatched fault regimes. These results establish MARTA as a theoretically grounded and practically deployable mechanism for fault-tolerant MARL.

Paper Structure

This paper contains 16 sections, 24 theorems, 70 equations, 10 figures, 3 tables, 3 algorithms.

Key Result

Theorem 3.1

The minimax value of ${\mathcal{G}}$ exists and is unique, i.e. there exists a function $v^\ast:{\mathcal{S}}\to\mathbb{R}$ such that $v^\ast(s):=\min\limits_{\hat{\mathfrak{g}}}\max\limits_{\hat{\boldsymbol{\pi}}\in \bm\Pi} v(s|\boldsymbol{\hat{\pi}},\hat{\mathfrak{g}})=\max\limits_{\hat{\boldsymbo

Figures (10)

  • Figure 1: Robustness against malfunctions. Each plot compares a base MARL algorithm with and without MARTA in faulty agent settings. In all scenarios, MARTA improves robustness.
  • Figure 2: Ablation results in MPE.
  • Figure 3: Performance of MARTA+VDN. MARTA improves performance in all scenarios.
  • Figure 4: Comparison between MADDPG with M3DDPG and MARTA-MADDPG in MPE.
  • Figure 5: Evaluation of MARTA and EIR across four environments under two fault regimes. Case 1 (fixed fault): a single agent fails with fixed probability, with aligned train–test distributions. Case 2 (dynamic random fault): at test time, any agent may malfunction at any timestep, inducing a train–test distribution mismatch. Top row reports risk metrics (lower is better); bottom row reports success and coordination metrics (higher is better).
  • ...and 5 more figures

Theorems & Definitions (41)

  • Theorem 3.1
  • Proposition 3.2
  • Theorem 3.3
  • Theorem 3.4
  • Theorem 4.1
  • Theorem 4.2
  • Definition 5.1
  • Definition 5.2
  • Lemma 5.3
  • Lemma 5.4
  • ...and 31 more