Data Poisoning to Fake a Nash Equilibrium in Markov Games

Young Wu; Jeremy McMahan; Xiaojin Zhu; Qiaomin Xie

Data Poisoning to Fake a Nash Equilibrium in Markov Games

Young Wu, Jeremy McMahan, Xiaojin Zhu, Qiaomin Xie

TL;DR

The paper addresses offline data poisoning in two-player zero-sum Markov games, showing an attacker can force learners to adopt a target pure Markov-perfect Nash equilibrium by poisoning training data. It introduces the Unique Nash Set (UN) of Q-functions and the attacker’s Theory of Mind (ToM) to model plausible learner beliefs, then reduces the problem to a linear program via an $\iota$-strict UN and a linear outer approximation of ToM. The authors provide both normal-form and Markov-game formulations, with a linear-program-based attack that minimizes $L_{1}$ data modification cost while ensuring the poisoned ToM lies inside the UN. Experimental results on Rock Paper Scissors and stochastic Matching Penny demonstrate the approach can install the target NE with lower cost than baselines, even under partial data coverage, highlighting potential vulnerabilities in offline MARL systems. These findings inform the design of robust MARL algorithms by clarifying the geometric structure of data-poisoning threats and offering tractable tools to evaluate and mitigate them.

Abstract

We characterize offline data poisoning attacks on Multi-Agent Reinforcement Learning (MARL), where an attacker may change a data set in an attempt to install a (potentially fictitious) unique Markov-perfect Nash equilibrium for a two-player zero-sum Markov game. We propose the unique Nash set, namely the set of games, specified by their Q functions, with a specific joint policy being the unique Nash equilibrium. The unique Nash set is central to poisoning attacks because the attack is successful if and only if data poisoning pushes all plausible games inside the set. The unique Nash set generalizes the reward polytope commonly used in inverse reinforcement learning to MARL. For zero-sum Markov games, both the inverse Nash set and the set of plausible games induced by data are polytopes in the Q function space. We exhibit a linear program to efficiently compute the optimal poisoning attack. Our work sheds light on the structure of data poisoning attacks on offline MARL, a necessary step before one can design more robust MARL algorithms.

Data Poisoning to Fake a Nash Equilibrium in Markov Games

TL;DR

-strict UN and a linear outer approximation of ToM. The authors provide both normal-form and Markov-game formulations, with a linear-program-based attack that minimizes

data modification cost while ensuring the poisoned ToM lies inside the UN. Experimental results on Rock Paper Scissors and stochastic Matching Penny demonstrate the approach can install the target NE with lower cost than baselines, even under partial data coverage, highlighting potential vulnerabilities in offline MARL systems. These findings inform the design of robust MARL algorithms by clarifying the geometric structure of data-poisoning threats and offering tractable tools to evaluate and mitigate them.

Abstract

Paper Structure (14 sections, 5 theorems, 24 equations, 2 figures, 6 tables)

This paper contains 14 sections, 5 theorems, 24 equations, 2 figures, 6 tables.

Introduction
Offline Attack on a Normal-form Game
The Unique Nash Set (UN) of a Normal-form Game
The Attacker's Theory of Mind (ToM) for Offline Normal-form Game Learners
The Cheapest Way to Move ToM into UN for Normal-form Games
Offline Attack on a Markov Game
The Unique Nash Set (UN) of a Markov Game
The Attacker's Theory of Mind (ToM) for Offline Multi-Agent Reinforcement Learners
The Cheapest Way to Move ToM into UN for Markov Games
Experiments
Rock Paper Scissors
Stochastic Matching Penny
Discussions
Acknowledgments

Key Result

Proposition 1

For any pure strategy profile $\pi$,

Figures (2)

Figure 1: Attacker's Problem
Figure 2: Distribution of rewards

Theorems & Definitions (27)

Definition 1: Nash Equilibrium
Definition 2: Unique Nash
Proposition 1: Unique Nash Polytope
Definition 3: Iota Strict Unique Nash
Definition 4: Theory of Mind
Definition 5: Outer Approximation of Theory of Mind
Example 1: Theory of Mind for Maximum Likelihood Victims
Example 2: Theory of Mind for Pessimistic Optimistic Victims
Example 3: Theory of Mind for Data Splitting Victims
Example 4: $L_{1}$ Cost Function
...and 17 more

Data Poisoning to Fake a Nash Equilibrium in Markov Games

TL;DR

Abstract

Data Poisoning to Fake a Nash Equilibrium in Markov Games

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (27)