Table of Contents
Fetching ...

Balancing the AI Strength of Roles in Self-Play Training with Regret Matching+

Xiaoxi Wang

TL;DR

The paper tackles uneven strength of a single generalized agent across multiple roles in self-play by applying Regret Matching+ to manipulate the data distribution over role-pair combinations. It formalizes a regret framework with a matrix $R$ and uses exponential smoothing $\bar{r}_{s_t}(i,j)$, plus a weight update rule controlled by parameter $\eta$ to emphasize weaker role-pairs while preserving exploration. An explicit $N=3$ example and a fighting-game evaluation on 13 characters illustrate that RM+ improves balance and reduces variance in cross-role performance, enabling more robust, data-efficient training of a single policy for multi-role gameplay. The approach thus supports practical deployment of generalized agents in complex multi-role games.

Abstract

When training artificial intelligence for games encompassing multiple roles, the development of a generalized model capable of controlling any character within the game presents a viable option. This strategy not only conserves computational resources and time during the training phase but also reduces resource requirements during deployment. training such a generalized model often encounters challenges related to uneven capabilities when controlling different roles. A simple method is introduced based on Regret Matching+, which facilitates a more balanced performance of strength by the model when controlling various roles.

Balancing the AI Strength of Roles in Self-Play Training with Regret Matching+

TL;DR

The paper tackles uneven strength of a single generalized agent across multiple roles in self-play by applying Regret Matching+ to manipulate the data distribution over role-pair combinations. It formalizes a regret framework with a matrix and uses exponential smoothing , plus a weight update rule controlled by parameter to emphasize weaker role-pairs while preserving exploration. An explicit example and a fighting-game evaluation on 13 characters illustrate that RM+ improves balance and reduces variance in cross-role performance, enabling more robust, data-efficient training of a single policy for multi-role gameplay. The approach thus supports practical deployment of generalized agents in complex multi-role games.

Abstract

When training artificial intelligence for games encompassing multiple roles, the development of a generalized model capable of controlling any character within the game presents a viable option. This strategy not only conserves computational resources and time during the training phase but also reduces resource requirements during deployment. training such a generalized model often encounters challenges related to uneven capabilities when controlling different roles. A simple method is introduced based on Regret Matching+, which facilitates a more balanced performance of strength by the model when controlling various roles.
Paper Structure (5 sections, 14 equations, 1 table)