Table of Contents
Fetching ...

Transformers as Game Players: Provable In-context Game-playing Capabilities of Pre-trained Models

Chengshuai Shi, Kun Yang, Jing Yang, Cong Shen

TL;DR

Focusing on the classical two-player zero-sum games, theoretical guarantees are provided to demonstrate that pre-trained transformers can provably learn to approximate Nash equilibrium in an in-context manner for both decentralized and centralized learning settings.

Abstract

The in-context learning (ICL) capability of pre-trained models based on the transformer architecture has received growing interest in recent years. While theoretical understanding has been obtained for ICL in reinforcement learning (RL), the previous results are largely confined to the single-agent setting. This work proposes to further explore the in-context learning capabilities of pre-trained transformer models in competitive multi-agent games, i.e., in-context game-playing (ICGP). Focusing on the classical two-player zero-sum games, theoretical guarantees are provided to demonstrate that pre-trained transformers can provably learn to approximate Nash equilibrium in an in-context manner for both decentralized and centralized learning settings. As a key part of the proof, constructional results are established to demonstrate that the transformer architecture is sufficiently rich to realize celebrated multi-agent game-playing algorithms, in particular, decentralized V-learning and centralized VI-ULCB.

Transformers as Game Players: Provable In-context Game-playing Capabilities of Pre-trained Models

TL;DR

Focusing on the classical two-player zero-sum games, theoretical guarantees are provided to demonstrate that pre-trained transformers can provably learn to approximate Nash equilibrium in an in-context manner for both decentralized and centralized learning settings.

Abstract

The in-context learning (ICL) capability of pre-trained models based on the transformer architecture has received growing interest in recent years. While theoretical understanding has been obtained for ICL in reinforcement learning (RL), the previous results are largely confined to the single-agent setting. This work proposes to further explore the in-context learning capabilities of pre-trained transformer models in competitive multi-agent games, i.e., in-context game-playing (ICGP). Focusing on the classical two-player zero-sum games, theoretical guarantees are provided to demonstrate that pre-trained transformers can provably learn to approximate Nash equilibrium in an in-context manner for both decentralized and centralized learning settings. As a key part of the proof, constructional results are established to demonstrate that the transformer architecture is sufficiently rich to realize celebrated multi-agent game-playing algorithms, in particular, decentralized V-learning and centralized VI-ULCB.

Paper Structure

This paper contains 53 sections, 15 theorems, 162 equations, 2 figures, 3 algorithms.

Key Result

Theorem 3.3

Let $\widehat{{\bm{\theta}}}_+$ be the max-player's pre-training output defined in Sec. subsubsec:decentralized_basics. Take ${\mathcal{N}}_{\Theta_+} = {\mathcal{N}}_{\Theta_+}(1/N)$ as in Def. def:decentralized_covering. Then, under Assumption aspt:decentralized_realizability, with probability at A similar result holds for the min-players' pre-training output $\widehat{{\bm{\theta}}}_-$.

Figures (2)

  • Figure 1: An overall view of the framework, where the in-context game-playing (ICGP) capabilities of transformers are studied in both decentralized and centralized learning settings. The orange arrows denote the supervised pre-training procedure and the blue arrows mark the inference procedure.
  • Figure 2: Comparisons of Nash equilibrium (NE) gaps over episodes in both decentralized and centralized learning scenarios, averaged over $10$ inference games.

Theorems & Definitions (34)

  • Definition 2.1: Approximate Nash equilibrium
  • Definition 2.2: Masked Attention Layer
  • Definition 2.3: MLP Layer
  • Definition 2.4: Decoder-based Transformer
  • Definition 3.1: Decentralized Covering Number
  • Theorem 3.3: Decentralized Pre-training Guarantee
  • Theorem 3.4
  • Theorem 3.5
  • Lemma 3.6
  • Theorem 4.1
  • ...and 24 more