Learning Strategy Representation for Imitation Learning in Multi-Agent Games
Shiqi Lei, Kanghoon Lee, Linjing Li, Jinkyoo Park
TL;DR
STRIL addresses sub-optimal data in offline imitation learning for multi-agent games by learning per-trajectory strategy representations via a partially-trainable-conditioned VRNN (P-VRNN). It introduces two indicators, RI and EL, to quantify trajectory quality using the learned representations, with EL estimable from partial rewards and RI unsupervised. The method is a plug-in that filters the offline data before applying IL algorithms, improving performance across Three competitive games and reducing reliance on player IDs or online interaction. Overall, STRIL enhances data efficiency and robustness in offline, multi-agent imitation learning by isolating dominant strategies through learned representations and indicators.
Abstract
The offline datasets for imitation learning (IL) in multi-agent games typically contain player trajectories exhibiting diverse strategies, which necessitate measures to prevent learning algorithms from acquiring undesirable behaviors. Learning representations for these trajectories is an effective approach to depicting the strategies employed by each demonstrator. However, existing learning strategies often require player identification or rely on strong assumptions, which are not appropriate for multi-agent games. Therefore, in this paper, we introduce the Strategy Representation for Imitation Learning (STRIL) framework, which (1) effectively learns strategy representations in multi-agent games, (2) estimates proposed indicators based on these representations, and (3) filters out sub-optimal data using the indicators. STRIL is a plug-in method that can be integrated into existing IL algorithms. We demonstrate the effectiveness of STRIL across competitive multi-agent scenarios, including Two-player Pong, Limit Texas Hold'em, and Connect Four. Our approach successfully acquires strategy representations and indicators, thereby identifying dominant trajectories and significantly enhancing existing IL performance across these environments.
