Corruption-Robust Offline Two-Player Zero-Sum Markov Games

Andi Nika; Debmalya Mandal; Adish Singla; Goran Radanović

Corruption-Robust Offline Two-Player Zero-Sum Markov Games

Andi Nika, Debmalya Mandal, Adish Singla, Goran Radanović

TL;DR

This work proposes robust versions of the Pessimistic Minimax Value Iteration algorithm, both under coverage on the corrupted data and under coverage only on the clean data, and shows that they achieve (near)-optimal suboptimality gap bounds with respect to $\epsilon$.

Abstract

We study data corruption robustness in offline two-player zero-sum Markov games. Given a dataset of realized trajectories of two players, an adversary is allowed to modify an $ε$-fraction of it. The learner's goal is to identify an approximate Nash Equilibrium policy pair from the corrupted data. We consider this problem in linear Markov games under different degrees of data coverage and corruption. We start by providing an information-theoretic lower bound on the suboptimality gap of any learner. Next, we propose robust versions of the Pessimistic Minimax Value Iteration algorithm, both under coverage on the corrupted data and under coverage only on the clean data, and show that they achieve (near)-optimal suboptimality gap bounds with respect to $ε$. We note that we are the first to provide such a characterization of the problem of learning approximate Nash Equilibrium policies in offline two-player zero-sum Markov games under data corruption.

Corruption-Robust Offline Two-Player Zero-Sum Markov Games

TL;DR

Abstract

We study data corruption robustness in offline two-player zero-sum Markov games. Given a dataset of realized trajectories of two players, an adversary is allowed to modify an

-fraction of it. The learner's goal is to identify an approximate Nash Equilibrium policy pair from the corrupted data. We consider this problem in linear Markov games under different degrees of data coverage and corruption. We start by providing an information-theoretic lower bound on the suboptimality gap of any learner. Next, we propose robust versions of the Pessimistic Minimax Value Iteration algorithm, both under coverage on the corrupted data and under coverage only on the clean data, and show that they achieve (near)-optimal suboptimality gap bounds with respect to

. We note that we are the first to provide such a characterization of the problem of learning approximate Nash Equilibrium policies in offline two-player zero-sum Markov games under data corruption.

Paper Structure (26 sections, 26 theorems, 149 equations, 1 figure, 1 table, 1 algorithm)

This paper contains 26 sections, 26 theorems, 149 equations, 1 figure, 1 table, 1 algorithm.

INTRODUCTION
PRELIMINARIES
Two-player Zero-sum Markov Games
Nash Equilibria and Performance Metrics
Linear Markov Games
Offline Data Collection
Corruption Robust Estimation
PROBLEM FORMULATION
RESULTS UNDER COVERAGE ON CORRUPTED DATA
Uniform $\Sigma$-Coverage and Corrupted Covariates
LRU Coverage and Clean Covariates
RESULTS UNDER COVERAGE ON CLEAN DATA
DISCUSSION ON MARL COVERAGE ASSUMPTIONS
RELATED WORK
CONCLUSION
...and 11 more sections

Key Result

Theorem 1

For every algorithm $L$, there exists a Markov game $\mathcal{G}$, an instance of the corrupted dataset, corruption level $\epsilon$, and a data collecting distribution $\rho$, such that, with probability at least $1/4$, $L$ will find a no-better than $\Omega(Hd\epsilon)$-approximate NE policy pair

Figures (1)

Figure 1: Relationship between coverage assumptions. The minimal coverage requirements are single policy coverage and LRU coverage for the single-player and two-player settings, respectively. Arrows stand for implications. The middle (dashed) arrow denotes the restriction from the two-player to the single-player setting: when fixing the second player's policy, the LRU coverage assumption reduces to the uniform coverage assumption.

Theorems & Definitions (51)

Definition 1
Definition 2: Linear Markov games
Definition 3: Compliance of dataset
Definition 4
Theorem 1
Theorem 2
Remark 1
Remark 2
Lemma 1: chen2022online
Theorem 3
...and 41 more

Corruption-Robust Offline Two-Player Zero-Sum Markov Games

TL;DR

Abstract

Corruption-Robust Offline Two-Player Zero-Sum Markov Games

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (51)