Table of Contents
Fetching ...

Learning Strategy Representation for Imitation Learning in Multi-Agent Games

Shiqi Lei, Kanghoon Lee, Linjing Li, Jinkyoo Park

TL;DR

STRIL addresses sub-optimal data in offline imitation learning for multi-agent games by learning per-trajectory strategy representations via a partially-trainable-conditioned VRNN (P-VRNN). It introduces two indicators, RI and EL, to quantify trajectory quality using the learned representations, with EL estimable from partial rewards and RI unsupervised. The method is a plug-in that filters the offline data before applying IL algorithms, improving performance across Three competitive games and reducing reliance on player IDs or online interaction. Overall, STRIL enhances data efficiency and robustness in offline, multi-agent imitation learning by isolating dominant strategies through learned representations and indicators.

Abstract

The offline datasets for imitation learning (IL) in multi-agent games typically contain player trajectories exhibiting diverse strategies, which necessitate measures to prevent learning algorithms from acquiring undesirable behaviors. Learning representations for these trajectories is an effective approach to depicting the strategies employed by each demonstrator. However, existing learning strategies often require player identification or rely on strong assumptions, which are not appropriate for multi-agent games. Therefore, in this paper, we introduce the Strategy Representation for Imitation Learning (STRIL) framework, which (1) effectively learns strategy representations in multi-agent games, (2) estimates proposed indicators based on these representations, and (3) filters out sub-optimal data using the indicators. STRIL is a plug-in method that can be integrated into existing IL algorithms. We demonstrate the effectiveness of STRIL across competitive multi-agent scenarios, including Two-player Pong, Limit Texas Hold'em, and Connect Four. Our approach successfully acquires strategy representations and indicators, thereby identifying dominant trajectories and significantly enhancing existing IL performance across these environments.

Learning Strategy Representation for Imitation Learning in Multi-Agent Games

TL;DR

STRIL addresses sub-optimal data in offline imitation learning for multi-agent games by learning per-trajectory strategy representations via a partially-trainable-conditioned VRNN (P-VRNN). It introduces two indicators, RI and EL, to quantify trajectory quality using the learned representations, with EL estimable from partial rewards and RI unsupervised. The method is a plug-in that filters the offline data before applying IL algorithms, improving performance across Three competitive games and reducing reliance on player IDs or online interaction. Overall, STRIL enhances data efficiency and robustness in offline, multi-agent imitation learning by isolating dominant strategies through learned representations and indicators.

Abstract

The offline datasets for imitation learning (IL) in multi-agent games typically contain player trajectories exhibiting diverse strategies, which necessitate measures to prevent learning algorithms from acquiring undesirable behaviors. Learning representations for these trajectories is an effective approach to depicting the strategies employed by each demonstrator. However, existing learning strategies often require player identification or rely on strong assumptions, which are not appropriate for multi-agent games. Therefore, in this paper, we introduce the Strategy Representation for Imitation Learning (STRIL) framework, which (1) effectively learns strategy representations in multi-agent games, (2) estimates proposed indicators based on these representations, and (3) filters out sub-optimal data using the indicators. STRIL is a plug-in method that can be integrated into existing IL algorithms. We demonstrate the effectiveness of STRIL across competitive multi-agent scenarios, including Two-player Pong, Limit Texas Hold'em, and Connect Four. Our approach successfully acquires strategy representations and indicators, thereby identifying dominant trajectories and significantly enhancing existing IL performance across these environments.
Paper Structure (33 sections, 2 theorems, 29 equations, 7 figures, 2 tables)

This paper contains 33 sections, 2 theorems, 29 equations, 7 figures, 2 tables.

Key Result

Proposition 11.1

If $\tau(\pi)$ is a distribution over $\Pi$, and $E$ is defined as exploitability, then we have

Figures (7)

  • Figure 1: The overall diagram of Strategy Representation for Imitation Learning (STRIL).
  • Figure 2: The decomposed network structure of the P-VRNN model. The variables are depicted as circles, learnable parameters as diamonds, and partially-trainable variables as a combination of both diamonds and circles.
  • Figure 3: The learned strategy representations with different labels on the Two-player Pong (a-d), Limit Texas Hold’em (e-h), and Connect Four (i-l) environments.
  • Figure 4: $\operatorname{WS}$ of each IL algorithm across different percentile ($p$) values for each indicator. The grey-shaded region represents the model trained on the original dataset, equivalent to the vanilla algorithm. Moving further to the right in the subfigure indicates a decrease in the data used. Higher is better.
  • Figure 5: Illustration of EL and exploitability of a strategy in a two-player zero-sum game with three pure strategies.
  • ...and 2 more figures

Theorems & Definitions (4)

  • Proposition 11.1
  • Proposition 11.2
  • proof
  • proof