Table of Contents
Fetching ...

A Controllability Perspective on Steering Follow-the-Regularized-Leader Learners in Games

Heling Zhang, Siqi Du, Roy Dong

Abstract

Follow-the-regularized-leader (FTRL) algorithms have become popular in the context of games, providing easy-to-implement methods for each agent, as well as theoretical guarantees that the strategies of all agents will converge to some equilibrium concept (provided that all agents follow the appropriate dynamics). However, with these methods, each agent ignores the coupling in the game, and treats their payoff vectors as exogenously given. In this paper, we take the perspective of one agent (the controller) deciding their mixed strategies in a finite game, while one or more other agents update their mixed strategies according to continuous-time FTRL. Viewing the learners' dynamics as a nonlinear control system evolving on the relative interior of a simplex or product of simplices, we ask when the controller can steer the learners to a target state, using only its own mixed strategy and without modifying the game's payoff structure. For the two-player case we provide a necessary and sufficient criterion for controllability based on the existence of a fully mixed neutralizing controller strategy and a rank condition on the projected payoff map. For multi-learner interactions we give two sufficient controllability conditions, one based on uniform neutralization and one based on a periodic-drift hypothesis together with a Lie-algebra rank condition. We illustrate these results on canonical examples such as Rock-Paper-Scissors and a construction related to Brockett's integrator.

A Controllability Perspective on Steering Follow-the-Regularized-Leader Learners in Games

Abstract

Follow-the-regularized-leader (FTRL) algorithms have become popular in the context of games, providing easy-to-implement methods for each agent, as well as theoretical guarantees that the strategies of all agents will converge to some equilibrium concept (provided that all agents follow the appropriate dynamics). However, with these methods, each agent ignores the coupling in the game, and treats their payoff vectors as exogenously given. In this paper, we take the perspective of one agent (the controller) deciding their mixed strategies in a finite game, while one or more other agents update their mixed strategies according to continuous-time FTRL. Viewing the learners' dynamics as a nonlinear control system evolving on the relative interior of a simplex or product of simplices, we ask when the controller can steer the learners to a target state, using only its own mixed strategy and without modifying the game's payoff structure. For the two-player case we provide a necessary and sufficient criterion for controllability based on the existence of a fully mixed neutralizing controller strategy and a rank condition on the projected payoff map. For multi-learner interactions we give two sufficient controllability conditions, one based on uniform neutralization and one based on a periodic-drift hypothesis together with a Lie-algebra rank condition. We illustrate these results on canonical examples such as Rock-Paper-Scissors and a construction related to Brockett's integrator.

Paper Structure

This paper contains 26 sections, 11 theorems, 93 equations, 3 figures.

Key Result

Proposition 1

Consider a driftless control-affine system $\dot{x} = \sum_{i=1}^m u_i f_i(x)$ on a connected manifold $M$ with a proper control set $U$. If the family of vector fields $\{f_1, \dots, f_m\}$ is bracket generating on $M$, then the system is small-time locally controllable (STLC) on $M$.

Figures (3)

  • Figure 1: Approximate attainable sets for the Modified RPS game: We plot the approximate attainable set for three random initial points in $\Delta_{3}^{\circ}$, for two different variants of FTRL. The figures on the left shows the case where the learner adopts the Replicator Dynamics, while the figures on the right shows that of the learner who adopts FTRL with $h(x) = 1/2\|x\|^{2}$. The reachable set is approximated by plotting the union of states generated by constant controls $u \in \Delta_3$ sampled on the simplex lattice $u=(i,j,k)/50$, $i+j+k=50$, and time horizons $t \in [0,12]$ sampled on a uniform grid of $45$ points.
  • Figure 2: Interdependency graph for the Brockett's Integrator game: This game shows the interdependency of players in the Brockett's Integrator game presented in Section \ref{['ex:BI']}. As shown in the graph, learner 1's payoff only depend on the probability of the controller playing his first two strategies, and learner 2's payoff only s on the probability of the controller playing his last two strategies. Learner three's payoff depends on the probability of learner 1 and 2 playing their first strategies, as well as the entire mixed strategy of the controller.
  • Figure 3: Interdependency graph for the Regulated Matching Pennies game: This figure shows the interdependency among players in the Regulated Matching Pennies game presented in Section \ref{['ex:RMP']}. The two learners are involved in a two-player zero-sum game with a payoff matrix chosen by the controller. Again, we use $u \in \Delta_{3}$ to represents the controller's mixed strategy, $x_{i} \in \Delta_{2}$ represents the mixed strategy of learner $i$, and $p_{i}$ represents the payoff vector of learner $i$.

Theorems & Definitions (26)

  • Definition 1: Finite Games
  • Definition 2: Proper Control Set
  • Definition 3: Bracket Generating
  • Proposition 1: Chow-Rashevskii
  • Proposition 2: Local Controllability Implies Global Controllability
  • Proposition 3: Krener's Theorem
  • Remark 1
  • Proposition 4: State equivalence preserves controllability
  • Lemma 1
  • Remark 2: Why we work on $\mathcal{X} ^\circ$
  • ...and 16 more