Reinforcement Learning Driven Multi-Robot Exploration via Explicit Communication and Density-Based Frontier Search

Gabriele Calzolari; Vidya Sumathy; Christoforos Kanellakis; George Nikolakopoulos

Reinforcement Learning Driven Multi-Robot Exploration via Explicit Communication and Density-Based Frontier Search

Gabriele Calzolari, Vidya Sumathy, Christoforos Kanellakis, George Nikolakopoulos

TL;DR

The paper tackles scalable, robust exploration by multiple heterogeneous robots under partial observability and limited communication. It introduces a decentralized CTDE RL framework using an agent-centered FOV occupancy grid, A$^*$-informed frontier features, and a constrained, proximity-based data-sharing mechanism, optimized with HAPPO and a shared critic. Two reward schemes encourage data sharing and frontier-driven exploration, with comprehensive simulations in Gymnasium and real-robot ROS2 trials demonstrating faster discovery and reduced redundancy. The approach substantially reduces exploration time via inter-agent map sharing and shows feasibility for real-world heterogeneous deployments, with avenues for extending to broader platforms.

Abstract

Collaborative multi-agent exploration of unknown environments is crucial for search and rescue operations. Effective real-world deployment must address challenges such as limited inter-agent communication and static and dynamic obstacles. This paper introduces a novel decentralized collaborative framework based on Reinforcement Learning to enhance multi-agent exploration in unknown environments. Our approach enables agents to decide their next action using an agent-centered field-of-view occupancy grid, and features extracted from $\text{A}^*$ algorithm-based trajectories to frontiers in the reconstructed global map. Furthermore, we propose a constrained communication scheme that enables agents to share their environmental knowledge efficiently, minimizing exploration redundancy. The decentralized nature of our framework ensures that each agent operates autonomously, while contributing to a collective exploration mission. Extensive simulations in Gymnasium and real-world experiments demonstrate the robustness and effectiveness of our system, while all the results highlight the benefits of combining autonomous exploration with inter-agent map sharing, advancing the development of scalable and resilient robotic exploration systems.

Reinforcement Learning Driven Multi-Robot Exploration via Explicit Communication and Density-Based Frontier Search

TL;DR

-informed frontier features, and a constrained, proximity-based data-sharing mechanism, optimized with HAPPO and a shared critic. Two reward schemes encourage data sharing and frontier-driven exploration, with comprehensive simulations in Gymnasium and real-robot ROS2 trials demonstrating faster discovery and reduced redundancy. The approach substantially reduces exploration time via inter-agent map sharing and shows feasibility for real-world heterogeneous deployments, with avenues for extending to broader platforms.

Abstract

algorithm-based trajectories to frontiers in the reconstructed global map. Furthermore, we propose a constrained communication scheme that enables agents to share their environmental knowledge efficiently, minimizing exploration redundancy. The decentralized nature of our framework ensures that each agent operates autonomously, while contributing to a collective exploration mission. Extensive simulations in Gymnasium and real-world experiments demonstrate the robustness and effectiveness of our system, while all the results highlight the benefits of combining autonomous exploration with inter-agent map sharing, advancing the development of scalable and resilient robotic exploration systems.

Paper Structure (13 sections, 1 equation, 6 figures, 1 table, 1 algorithm)

This paper contains 13 sections, 1 equation, 6 figures, 1 table, 1 algorithm.

INTRODUCTION
Related Work
Methodology
Overview of the RL environment
Decentralized-POMDP architecture
Inter-agent communication strategy
Multi-agent reinforcement learning algorithm
Design of the actors and shared critic architectures
Communication-induced reward function
Training and simulations
Simulation results
Experimental validation
Conclusions

Figures (6)

Figure 1: Illustration of the centralized training framework for collaborative multi-agent exploration in unknown environments. The scenario involves four robots navigating an occupancy grid, where black cells denote obstacles and white cells indicate free space, implemented using Gymnasium. The diagram focuses on Agent 1, which gathers raw data from the arena and processes them through the observation mapper to compute the observation used by the actor policy to select the next action. Implementation details of this module are shown in the yellow section. Additionally, the shared critic, which receives observations from all agents along with the joint reward, facilitates actor policy training. The other agents follow a similar architecture to Agent 1.
Figure 2: Neural network architectures based on convolutional layers for the modeling of the agent policy (a) and shared critic (b) in the multi-agent exploration framework. In particular, while the left figure illustrates the architecture of agent $1's$ policy $\pi_1$, analogous structures are valid also for the other agents.
Figure 3: Statistics from simulating the trained policies, evaluated across 200 exploration arenas with randomized static obstacles and agent initial positions, using the two proposed neural network architectures and reward functions.
Figure 4: Distribution of agents' map expansions due to inter-agent communication, for the four policy types across the 200 simulated exploration arenas.
Figure 5: From left to right: the first image depicts the custom-built lab setup with four TurtleBot3 mobile platforms, two communicating and two exploring, in the exploration arena outlined by the dashed line. In particular, the obstacles are indicated by the green blocks. The remaining four figures on the right show the areas explored by each agent, represented by different colors, while the static obstacles are highlighted in gray.
...and 1 more figures

Reinforcement Learning Driven Multi-Robot Exploration via Explicit Communication and Density-Based Frontier Search

TL;DR

Abstract

Reinforcement Learning Driven Multi-Robot Exploration via Explicit Communication and Density-Based Frontier Search

Authors

TL;DR

Abstract

Table of Contents

Figures (6)