Table of Contents
Fetching ...

Variational Offline Multi-agent Skill Discovery

Jiayu Chen, Tian Lan, Vaneet Aggarwal

TL;DR

This work introduces VO-MASD, a framework for offline multi-agent skill discovery that extracts subgroup and temporal abstractions as discrete skills. By employing auto-encoder architectures with 3D and hierarchical codebooks and a dynamic grouping function, it automatically discovers coordination patterns across varying subgroup sizes and transfers these skills to online MARL tasks. Empirical results on StarCraft SMAC and SMACv2 demonstrate improved performance, especially in unseen tasks and sparse-reward settings, with VO-MASD-Hier often delivering the strongest results. The approach enables scalable, multi-task skill reuse and offers a principled path toward automatic coordination pattern discovery in cooperative MARL.

Abstract

Skills are effective temporal abstractions established for sequential decision making, which enable efficient hierarchical learning for long-horizon tasks and facilitate multi-task learning through their transferability. Despite extensive research, research gaps remain in multi-agent scenarios, particularly for automatically extracting subgroup coordination patterns in a multi-agent task. In this case, we propose two novel auto-encoder schemes: VO-MASD-3D and VO-MASD-Hier, to simultaneously capture subgroup- and temporal-level abstractions and form multi-agent skills, which firstly solves the aforementioned challenge. An essential algorithm component of these schemes is a dynamic grouping function that can automatically detect latent subgroups based on agent interactions in a task. Further, our method can be applied to offline multi-task data, and the discovered subgroup skills can be transferred across relevant tasks without retraining. Empirical evaluations on StarCraft tasks indicate that our approach significantly outperforms existing hierarchical multi-agent reinforcement learning (MARL) methods. Moreover, skills discovered using our method can effectively reduce the learning difficulty in MARL scenarios with delayed and sparse reward signals. The codebase is available at https://github.com/LucasCJYSDL/VOMASD.

Variational Offline Multi-agent Skill Discovery

TL;DR

This work introduces VO-MASD, a framework for offline multi-agent skill discovery that extracts subgroup and temporal abstractions as discrete skills. By employing auto-encoder architectures with 3D and hierarchical codebooks and a dynamic grouping function, it automatically discovers coordination patterns across varying subgroup sizes and transfers these skills to online MARL tasks. Empirical results on StarCraft SMAC and SMACv2 demonstrate improved performance, especially in unseen tasks and sparse-reward settings, with VO-MASD-Hier often delivering the strongest results. The approach enables scalable, multi-task skill reuse and offers a principled path toward automatic coordination pattern discovery in cooperative MARL.

Abstract

Skills are effective temporal abstractions established for sequential decision making, which enable efficient hierarchical learning for long-horizon tasks and facilitate multi-task learning through their transferability. Despite extensive research, research gaps remain in multi-agent scenarios, particularly for automatically extracting subgroup coordination patterns in a multi-agent task. In this case, we propose two novel auto-encoder schemes: VO-MASD-3D and VO-MASD-Hier, to simultaneously capture subgroup- and temporal-level abstractions and form multi-agent skills, which firstly solves the aforementioned challenge. An essential algorithm component of these schemes is a dynamic grouping function that can automatically detect latent subgroups based on agent interactions in a task. Further, our method can be applied to offline multi-task data, and the discovered subgroup skills can be transferred across relevant tasks without retraining. Empirical evaluations on StarCraft tasks indicate that our approach significantly outperforms existing hierarchical multi-agent reinforcement learning (MARL) methods. Moreover, skills discovered using our method can effectively reduce the learning difficulty in MARL scenarios with delayed and sparse reward signals. The codebase is available at https://github.com/LucasCJYSDL/VOMASD.
Paper Structure (20 sections, 3 equations, 10 figures, 1 table, 2 algorithms)

This paper contains 20 sections, 3 equations, 10 figures, 1 table, 2 algorithms.

Figures (10)

  • Figure 1: Multi-agent skill discovery based on a VQ-VAE with 3D codebooks.
  • Figure 2: Utilizing discovered skills for downstream CTDE MARL. Compared to standard CTDE MARL, individual actions $a^{1:n}$ are replaced with skill embeddings $z^{1:n}$ as the actor's output. These embeddings are then translated into skill codes and control segments using the pretrained VO-MASD components, as shown in Figure \ref{['fig:1']}. Thus, only the individual actor $\pi_\omega$ and centralized critic $V_\eta$ need to be trained.
  • Figure 3: Multi-agent skill discovery based on a VQ-VAE with a hierarchical codebook.
  • Figure 4: Evaluation of effectiveness of the discovered skills using different algorithms for online MARL.
  • Figure 5: Evaluation results on SMACv2. We compare the performance of online MARL using skills discovered with our methods and ODIS. As an additional baseline, we also include HMASD, an online hierarchical MARL method that discovers task-specific skills during training, as opposed to leveraging prelearned skills from offline data, as done in our methods and ODIS.
  • ...and 5 more figures

Theorems & Definitions (1)

  • Definition 1