Table of Contents
Fetching ...

Corrupted Learning Dynamics in Games

Taira Tsuchiya, Shinji Ito, Haipeng Luo

TL;DR

Corrupted learning dynamics are presented that adaptively find an equilibrium at a rate that depends on the extent to which each player deviates from the strategy suggested by the prescribed algorithm while matching the best existing bound in the honest regime.

Abstract

Learning in games refers to scenarios where multiple players interact in a shared environment, each aiming to minimize their regret. An equilibrium can be computed at a fast rate of $O(1/T)$ when all players follow the optimistic follow-the-regularized-leader (OFTRL). However, this acceleration is limited to the honest regime, in which all players adhere to a prescribed algorithm -- a situation that may not be realistic in practice. To address this issue, we present corrupted learning dynamics that adaptively find an equilibrium at a rate that depends on the extent to which each player deviates from the strategy suggested by the prescribed algorithm. First, in two-player zero-sum corrupted games, we provide learning dynamics for which the external regret of $x$-player (and similarly for $y$-player) is roughly bounded by $O(\log (m_x m_y) + \sqrt{\hat{C}_y} + \hat{C}_x)$, where $m_x$ and $m_y$ denote the number of actions of $x$- and $y$-players, respectively, and $\hat{C}_x$ and $\hat{C}_y$ represent their cumulative deviations. We then extend our approach to multi-player general-sum corrupted games, providing learning dynamics for which the swap regret of player $i$ is bounded by $O(\log T + \sqrt{\sum_{k} \hat{C}_k \log T} + \hat{C}_i)$ ignoring dependence on the number of players and actions, where $\hat{C}_i$ is the cumulative deviation of player $i$ from the prescribed algorithm. Our learning dynamics are agnostic to the levels of corruption. A key technical contribution is a new analysis that ensures the stability of a Markov chain under a new adaptive learning rate, thereby allowing us to achieve the desired bound in the corrupted regime while matching the best existing bound in the honest regime. Notably, our framework can be extended to address not only corruption in strategies but also corruption in the observed expected utilities, and we provide several matching lower bounds.

Corrupted Learning Dynamics in Games

TL;DR

Corrupted learning dynamics are presented that adaptively find an equilibrium at a rate that depends on the extent to which each player deviates from the strategy suggested by the prescribed algorithm while matching the best existing bound in the honest regime.

Abstract

Learning in games refers to scenarios where multiple players interact in a shared environment, each aiming to minimize their regret. An equilibrium can be computed at a fast rate of when all players follow the optimistic follow-the-regularized-leader (OFTRL). However, this acceleration is limited to the honest regime, in which all players adhere to a prescribed algorithm -- a situation that may not be realistic in practice. To address this issue, we present corrupted learning dynamics that adaptively find an equilibrium at a rate that depends on the extent to which each player deviates from the strategy suggested by the prescribed algorithm. First, in two-player zero-sum corrupted games, we provide learning dynamics for which the external regret of -player (and similarly for -player) is roughly bounded by , where and denote the number of actions of - and -players, respectively, and and represent their cumulative deviations. We then extend our approach to multi-player general-sum corrupted games, providing learning dynamics for which the swap regret of player is bounded by ignoring dependence on the number of players and actions, where is the cumulative deviation of player from the prescribed algorithm. Our learning dynamics are agnostic to the levels of corruption. A key technical contribution is a new analysis that ensures the stability of a Markov chain under a new adaptive learning rate, thereby allowing us to achieve the desired bound in the corrupted regime while matching the best existing bound in the honest regime. Notably, our framework can be extended to address not only corruption in strategies but also corruption in the observed expected utilities, and we provide several matching lower bounds.

Paper Structure

This paper contains 46 sections, 28 theorems, 106 equations, 2 tables, 1 algorithm.

Key Result

Theorem 1

In two-player zero-sum games, there exists $(\widehat{C}_x, \widehat{C}_y)$-agnostic learning dynamics such that the (external) regret of $x$-player is bounded by $\sqrt{ \log(m_x) \log(m_x m_y) }$ in the honest regime and by $\min\{*\}{ \sqrt{ \log(m_x) ([)]{ \log(m_x m_y) + \widehat{C}_x + \wideha

Theorems & Definitions (58)

  • Theorem 1: Informal version of \ref{['thm:indiv_reg_corrupt']}
  • Theorem 2: Informal version of \ref{['thm:indiv_swapreg']}
  • Definition 1: Nash equilibrium
  • Theorem 3: freund99adaptive
  • Definition 2: Correlated equilibrium, aumann74subjectivity
  • Theorem 4: foster97calibrated
  • Definition 3: Corrupted regime with corruption level $\{(\widehat{C}_i, \widetilde{C}_i)\}_{i \in [n]}$
  • Remark 1
  • Proposition 5
  • Theorem 6: External regret upper bounds
  • ...and 48 more