MAexp: A Generic Platform for RL-based Multi-Agent Exploration

Shaohao Zhu; Jiacheng Zhou; Anjun Chen; Mingming Bai; Jiming Chen; Jinming Xu

MAexp: A Generic Platform for RL-based Multi-Agent Exploration

Shaohao Zhu, Jiacheng Zhou, Anjun Chen, Mingming Bai, Jiming Chen, Jinming Xu

TL;DR

This work tackles the sim-to-real gap in RL-based multi-agent exploration by introducing MAexp, a generic, high-efficiency platform that uses point-cloud maps and continuous actions to enable realistic, fast simulations across diverse scenarios. MAexp integrates six state-of-the-art MARL algorithms (IPPO, ITRPO, MAPPO, MATRPO, VDPPO, VDA2C) within a unified framework and supports a two-step policy workflow: a Multi-Agent Target Generator for goal discovery and a Single-Agent Motion Planner for navigation, allowing arbitrary team sizes and robot types. The authors demonstrate nearly 40x faster sampling compared to existing platforms and establish the first comprehensive benchmark across six scenarios, revealing that each algorithm has distinct strengths depending on the environment and task structure. The platform’s mixture of high-fidelity, scalable simulations and robust evaluation provides a practical tool for algorithm development, fair comparison, and rapid prototyping in RL-based multi-agent exploration, with future potential to incorporate broader communication topologies and more scenarios.

Abstract

The sim-to-real gap poses a significant challenge in RL-based multi-agent exploration due to scene quantization and action discretization. Existing platforms suffer from the inefficiency in sampling and the lack of diversity in Multi-Agent Reinforcement Learning (MARL) algorithms across different scenarios, restraining their widespread applications. To fill these gaps, we propose MAexp, a generic platform for multi-agent exploration that integrates a broad range of state-of-the-art MARL algorithms and representative scenarios. Moreover, we employ point clouds to represent our exploration scenarios, leading to high-fidelity environment mapping and a sampling speed approximately 40 times faster than existing platforms. Furthermore, equipped with an attention-based Multi-Agent Target Generator and a Single-Agent Motion Planner, MAexp can work with arbitrary numbers of agents and accommodate various types of robots. Extensive experiments are conducted to establish the first benchmark featuring several high-performance MARL algorithms across typical scenarios for robots with continuous actions, which highlights the distinct strengths of each algorithm in different scenarios.

MAexp: A Generic Platform for RL-based Multi-Agent Exploration

TL;DR

Abstract

MAexp: A Generic Platform for RL-based Multi-Agent Exploration

Authors

TL;DR

Abstract

Table of Contents

Figures (4)