Balancing the AI Strength of Roles in Self-Play Training with Regret Matching+
Xiaoxi Wang
TL;DR
The paper tackles uneven strength of a single generalized agent across multiple roles in self-play by applying Regret Matching+ to manipulate the data distribution over role-pair combinations. It formalizes a regret framework with a matrix $R$ and uses exponential smoothing $\bar{r}_{s_t}(i,j)$, plus a weight update rule controlled by parameter $\eta$ to emphasize weaker role-pairs while preserving exploration. An explicit $N=3$ example and a fighting-game evaluation on 13 characters illustrate that RM+ improves balance and reduces variance in cross-role performance, enabling more robust, data-efficient training of a single policy for multi-role gameplay. The approach thus supports practical deployment of generalized agents in complex multi-role games.
Abstract
When training artificial intelligence for games encompassing multiple roles, the development of a generalized model capable of controlling any character within the game presents a viable option. This strategy not only conserves computational resources and time during the training phase but also reduces resource requirements during deployment. training such a generalized model often encounters challenges related to uneven capabilities when controlling different roles. A simple method is introduced based on Regret Matching+, which facilitates a more balanced performance of strength by the model when controlling various roles.
