Data Poisoning to Fake a Nash Equilibrium in Markov Games
Young Wu, Jeremy McMahan, Xiaojin Zhu, Qiaomin Xie
TL;DR
The paper addresses offline data poisoning in two-player zero-sum Markov games, showing an attacker can force learners to adopt a target pure Markov-perfect Nash equilibrium by poisoning training data. It introduces the Unique Nash Set (UN) of Q-functions and the attacker’s Theory of Mind (ToM) to model plausible learner beliefs, then reduces the problem to a linear program via an $\iota$-strict UN and a linear outer approximation of ToM. The authors provide both normal-form and Markov-game formulations, with a linear-program-based attack that minimizes $L_{1}$ data modification cost while ensuring the poisoned ToM lies inside the UN. Experimental results on Rock Paper Scissors and stochastic Matching Penny demonstrate the approach can install the target NE with lower cost than baselines, even under partial data coverage, highlighting potential vulnerabilities in offline MARL systems. These findings inform the design of robust MARL algorithms by clarifying the geometric structure of data-poisoning threats and offering tractable tools to evaluate and mitigate them.
Abstract
We characterize offline data poisoning attacks on Multi-Agent Reinforcement Learning (MARL), where an attacker may change a data set in an attempt to install a (potentially fictitious) unique Markov-perfect Nash equilibrium for a two-player zero-sum Markov game. We propose the unique Nash set, namely the set of games, specified by their Q functions, with a specific joint policy being the unique Nash equilibrium. The unique Nash set is central to poisoning attacks because the attack is successful if and only if data poisoning pushes all plausible games inside the set. The unique Nash set generalizes the reward polytope commonly used in inverse reinforcement learning to MARL. For zero-sum Markov games, both the inverse Nash set and the set of plausible games induced by data are polytopes in the Q function space. We exhibit a linear program to efficiently compute the optimal poisoning attack. Our work sheds light on the structure of data poisoning attacks on offline MARL, a necessary step before one can design more robust MARL algorithms.
