Table of Contents
Fetching ...

Investigating the Impact of Communication-Induced Action Space on Exploration of Unknown Environments with Decentralized Multi-Agent Reinforcement Learning

Gabriele Calzolari, Vidya Sumathy, Christoforos Kanellakis, George Nikolakopoulos

Abstract

This paper introduces a novel enhancement to the Decentralized Multi-Agent Reinforcement Learning (D-MARL) exploration by proposing communication-induced action space to improve the mapping efficiency of unknown environments using homogeneous agents. Efficient exploration of large environments relies heavily on inter-agent communication as real-world scenarios are often constrained by data transmission limits, such as signal latency and bandwidth. Our proposed method optimizes each agent's policy using the heterogeneous-agent proximal policy optimization algorithm, allowing agents to autonomously decide whether to communicate or to explore, that is whether to share the locally collected maps or continue the exploration. We propose and compare multiple novel reward functions that integrate inter-agent communication and exploration, enhance mapping efficiency and robustness, and minimize exploration overlap. This article presents a framework developed in ROS2 to evaluate and validate the investigated architecture. Specifically, four TurtleBot3 Burgers have been deployed in a Gazebo-designed environment filled with obstacles to evaluate the efficacy of the trained policies in mapping the exploration arena.

Investigating the Impact of Communication-Induced Action Space on Exploration of Unknown Environments with Decentralized Multi-Agent Reinforcement Learning

Abstract

This paper introduces a novel enhancement to the Decentralized Multi-Agent Reinforcement Learning (D-MARL) exploration by proposing communication-induced action space to improve the mapping efficiency of unknown environments using homogeneous agents. Efficient exploration of large environments relies heavily on inter-agent communication as real-world scenarios are often constrained by data transmission limits, such as signal latency and bandwidth. Our proposed method optimizes each agent's policy using the heterogeneous-agent proximal policy optimization algorithm, allowing agents to autonomously decide whether to communicate or to explore, that is whether to share the locally collected maps or continue the exploration. We propose and compare multiple novel reward functions that integrate inter-agent communication and exploration, enhance mapping efficiency and robustness, and minimize exploration overlap. This article presents a framework developed in ROS2 to evaluate and validate the investigated architecture. Specifically, four TurtleBot3 Burgers have been deployed in a Gazebo-designed environment filled with obstacles to evaluate the efficacy of the trained policies in mapping the exploration arena.
Paper Structure (28 sections, 4 equations, 11 figures, 5 tables, 2 algorithms)

This paper contains 28 sections, 4 equations, 11 figures, 5 tables, 2 algorithms.

Figures (11)

  • Figure 1: High-level overview of the exploration scheme where four TurtleBot3 Burgers navigate a Gazebo-designed environment with obstacles (dark gray). Each agent's policy decides the action to perform on the environment according to the morphology of the robot's neighborhood (blue) inferred through sensors
  • Figure 2: The environment is represented as a 2D grid map, wherein the free, occupied, and undiscovered cells are denoted by white, orange, and gray cells, respectively. The agents are denoted by $\square$ markers with colors red, green, blue, black, and purple. Fig. \ref{['fig:arena_with_obstacles']} depicts a completely known exploration arena showing the free and occupied cells. Each robot has its agent-specific map as shown for the red agent in Fig. \ref{['fig:agent_local_map']}. While the communication network and communication covered area of the red agent are shown in Fig. \ref{['fig:agent_transmission_map']} by the yellow ellipse and shaded area. Finally, Fig. \ref{['fig:agent_global_map']} shows the red agent's collaborative map, obtained after blending the agent-specific maps of the communicating agents, namely the green and red ones
  • Figure 3: Proposed decentralized cooperative multi-agent reinforcement learning architecture for exploration. The occupancy grid depicts the environment with obstacles (black) and free cells (white), while the agents' positions are outlined by the colored cells. Furthermore, the components of each agent are shown, namely the agent-based policy ($\pi_k$) and the shared critic ($V$)
  • Figure 4: Generation of the agents' observations resulting from the execution of their actions within the environment. Specifically, the illustration shows agent $i_1$ (orange), agent $i_2$ (red), agent $i_3$ (green), and $i_4$ (blue) performing different actions and the updates affecting the policies' observations
  • Figure 5: The diagrams depict the architecture of the neural networks used as the agents' policies (\ref{['fig:agent_policy_network']}) and the shared critics (\ref{['fig:agent_critic_network']}). Each illustration details the input size, the core layers composing the networks, and the resulting output data. Layers of the same type are highlighted with the same colors: green for convolutional layers (Conv), blue for flatten layers, red for fully-connected layers (FC), and orange for categorical distributions. Furthermore, each component specifies the output size of its respective layer
  • ...and 6 more figures