Variational Offline Multi-agent Skill Discovery
Jiayu Chen, Tian Lan, Vaneet Aggarwal
TL;DR
This work introduces VO-MASD, a framework for offline multi-agent skill discovery that extracts subgroup and temporal abstractions as discrete skills. By employing auto-encoder architectures with 3D and hierarchical codebooks and a dynamic grouping function, it automatically discovers coordination patterns across varying subgroup sizes and transfers these skills to online MARL tasks. Empirical results on StarCraft SMAC and SMACv2 demonstrate improved performance, especially in unseen tasks and sparse-reward settings, with VO-MASD-Hier often delivering the strongest results. The approach enables scalable, multi-task skill reuse and offers a principled path toward automatic coordination pattern discovery in cooperative MARL.
Abstract
Skills are effective temporal abstractions established for sequential decision making, which enable efficient hierarchical learning for long-horizon tasks and facilitate multi-task learning through their transferability. Despite extensive research, research gaps remain in multi-agent scenarios, particularly for automatically extracting subgroup coordination patterns in a multi-agent task. In this case, we propose two novel auto-encoder schemes: VO-MASD-3D and VO-MASD-Hier, to simultaneously capture subgroup- and temporal-level abstractions and form multi-agent skills, which firstly solves the aforementioned challenge. An essential algorithm component of these schemes is a dynamic grouping function that can automatically detect latent subgroups based on agent interactions in a task. Further, our method can be applied to offline multi-task data, and the discovered subgroup skills can be transferred across relevant tasks without retraining. Empirical evaluations on StarCraft tasks indicate that our approach significantly outperforms existing hierarchical multi-agent reinforcement learning (MARL) methods. Moreover, skills discovered using our method can effectively reduce the learning difficulty in MARL scenarios with delayed and sparse reward signals. The codebase is available at https://github.com/LucasCJYSDL/VOMASD.
