Cooperative Multi-Agent Planning with Adaptive Skill Synthesis
Zhiyuan Li, Wenshuai Zhao, Joni Pajarinen
TL;DR
The paper tackles sample efficiency, interpretability, and transferability in cooperative multi-agent systems by introducing COMPASS, a decentralized framework that unites a Vision-Language Model (VLM) based closed-loop planner, an adaptive, demonstration-bootstrapped skill library, and a structured, multi-hop communication protocol. It demonstrates strong performance on SMACv2, particularly in Protoss tasks where it achieves a win rate of $0.57$ and surpasses baselines such as QMIX, MAPPO, HAPPO, and HASAC, while also highlighting the contributions of skill bootstrapping, communication, and self-reflection through extensive ablations. The approach emphasizes interpretable, code-based skills and dynamic strategy refinement in a partially observable, decentralized setting, offering a scalable path toward real-world multi-agent coordination. However, performance gaps in certain race settings (notably Zerg) indicate areas for further generalization and efficiency improvements across diverse unit compositions and tactics.
Abstract
Despite much progress in training distributed artificial intelligence (AI), building cooperative multi-agent systems with multi-agent reinforcement learning (MARL) faces challenges in sample efficiency, interpretability, and transferability. Unlike traditional learning-based methods that require extensive interaction with the environment, large language models (LLMs) demonstrate remarkable capabilities in zero-shot planning and complex reasoning. However, existing LLM-based approaches heavily rely on text-based observations and struggle with the non-Markovian nature of multi-agent interactions under partial observability. We present COMPASS, a novel multi-agent architecture that integrates vision-language models (VLMs) with a dynamic skill library and structured communication for decentralized closed-loop decision-making. The skill library, bootstrapped from demonstrations, evolves via planner-guided tasks to enable adaptive strategies. COMPASS propagates entity information through multi-hop communication under partial observability. Evaluations on the improved StarCraft Multi-Agent Challenge (SMACv2) demonstrate COMPASS's strong performance against state-of-the-art MARL baselines across both symmetric and asymmetric scenarios. Notably, in the symmetric Protoss 5v5 task, COMPASS achieved a 57\% win rate, representing a 30 percentage point advantage over QMIX (27\%). Project page can be found at https://stellar-entremet-1720bb.netlify.app/.
