Table of Contents
Fetching ...

Data Poisoning to Fake a Nash Equilibrium in Markov Games

Young Wu, Jeremy McMahan, Xiaojin Zhu, Qiaomin Xie

TL;DR

The paper addresses offline data poisoning in two-player zero-sum Markov games, showing an attacker can force learners to adopt a target pure Markov-perfect Nash equilibrium by poisoning training data. It introduces the Unique Nash Set (UN) of Q-functions and the attacker’s Theory of Mind (ToM) to model plausible learner beliefs, then reduces the problem to a linear program via an $\iota$-strict UN and a linear outer approximation of ToM. The authors provide both normal-form and Markov-game formulations, with a linear-program-based attack that minimizes $L_{1}$ data modification cost while ensuring the poisoned ToM lies inside the UN. Experimental results on Rock Paper Scissors and stochastic Matching Penny demonstrate the approach can install the target NE with lower cost than baselines, even under partial data coverage, highlighting potential vulnerabilities in offline MARL systems. These findings inform the design of robust MARL algorithms by clarifying the geometric structure of data-poisoning threats and offering tractable tools to evaluate and mitigate them.

Abstract

We characterize offline data poisoning attacks on Multi-Agent Reinforcement Learning (MARL), where an attacker may change a data set in an attempt to install a (potentially fictitious) unique Markov-perfect Nash equilibrium for a two-player zero-sum Markov game. We propose the unique Nash set, namely the set of games, specified by their Q functions, with a specific joint policy being the unique Nash equilibrium. The unique Nash set is central to poisoning attacks because the attack is successful if and only if data poisoning pushes all plausible games inside the set. The unique Nash set generalizes the reward polytope commonly used in inverse reinforcement learning to MARL. For zero-sum Markov games, both the inverse Nash set and the set of plausible games induced by data are polytopes in the Q function space. We exhibit a linear program to efficiently compute the optimal poisoning attack. Our work sheds light on the structure of data poisoning attacks on offline MARL, a necessary step before one can design more robust MARL algorithms.

Data Poisoning to Fake a Nash Equilibrium in Markov Games

TL;DR

The paper addresses offline data poisoning in two-player zero-sum Markov games, showing an attacker can force learners to adopt a target pure Markov-perfect Nash equilibrium by poisoning training data. It introduces the Unique Nash Set (UN) of Q-functions and the attacker’s Theory of Mind (ToM) to model plausible learner beliefs, then reduces the problem to a linear program via an -strict UN and a linear outer approximation of ToM. The authors provide both normal-form and Markov-game formulations, with a linear-program-based attack that minimizes data modification cost while ensuring the poisoned ToM lies inside the UN. Experimental results on Rock Paper Scissors and stochastic Matching Penny demonstrate the approach can install the target NE with lower cost than baselines, even under partial data coverage, highlighting potential vulnerabilities in offline MARL systems. These findings inform the design of robust MARL algorithms by clarifying the geometric structure of data-poisoning threats and offering tractable tools to evaluate and mitigate them.

Abstract

We characterize offline data poisoning attacks on Multi-Agent Reinforcement Learning (MARL), where an attacker may change a data set in an attempt to install a (potentially fictitious) unique Markov-perfect Nash equilibrium for a two-player zero-sum Markov game. We propose the unique Nash set, namely the set of games, specified by their Q functions, with a specific joint policy being the unique Nash equilibrium. The unique Nash set is central to poisoning attacks because the attack is successful if and only if data poisoning pushes all plausible games inside the set. The unique Nash set generalizes the reward polytope commonly used in inverse reinforcement learning to MARL. For zero-sum Markov games, both the inverse Nash set and the set of plausible games induced by data are polytopes in the Q function space. We exhibit a linear program to efficiently compute the optimal poisoning attack. Our work sheds light on the structure of data poisoning attacks on offline MARL, a necessary step before one can design more robust MARL algorithms.
Paper Structure (14 sections, 5 theorems, 24 equations, 2 figures, 6 tables)

This paper contains 14 sections, 5 theorems, 24 equations, 2 figures, 6 tables.

Key Result

Proposition 1

For any pure strategy profile $\pi$,

Figures (2)

  • Figure 1: Attacker's Problem
  • Figure 2: Distribution of rewards

Theorems & Definitions (27)

  • Definition 1: Nash Equilibrium
  • Definition 2: Unique Nash
  • Proposition 1: Unique Nash Polytope
  • Definition 3: Iota Strict Unique Nash
  • Definition 4: Theory of Mind
  • Definition 5: Outer Approximation of Theory of Mind
  • Example 1: Theory of Mind for Maximum Likelihood Victims
  • Example 2: Theory of Mind for Pessimistic Optimistic Victims
  • Example 3: Theory of Mind for Data Splitting Victims
  • Example 4: $L_{1}$ Cost Function
  • ...and 17 more