Table of Contents
Fetching ...

Privileged Reinforcement and Communication Learning for Distributed, Bandwidth-limited Multi-robot Exploration

Yixiao Ma, Jingsong Liang, Yuhong Cao, Derek Ming Siang Tan, Guillaume Sartoretti

TL;DR

This paper addresses bandwidth-limited multi-robot exploration by learning to share fixed-size messages that encode salient information from each robot's partial map. It introduces a graph-attention based policy network and a privileged critic within a soft actor-critic framework to train in a distributed, communication-constrained setting. The results show up to 99.2% reduction in communication with only a 2.4% increase in total travel distance, and that allowing full map sharing further improves exploration efficiency, illustrating strong scalability to larger teams. Overall, the approach enables cooperative, efficient exploration under tight bandwidth constraints and points to practical pathways for real-world deployment and further bandwidth reductions.

Abstract

Communication bandwidth is an important consideration in multi-robot exploration, where information exchange among robots is critical. While existing methods typically aim to reduce communication throughput, they either require significant computation or significantly compromise exploration efficiency. In this work, we propose a deep reinforcement learning framework based on communication and privileged reinforcement learning to achieve a significant reduction in bandwidth consumption, while minimally sacrificing exploration efficiency. Specifically, our approach allows robots to learn to embed the most salient information from their individual belief (partial map) over the environment into fixed-sized messages. Robots then reason about their own belief as well as received messages to distributedly explore the environment while avoiding redundant work. In doing so, we employ privileged learning and learned attention mechanisms to endow the critic (i.e., teacher) network with ground truth map knowledge to effectively guide the policy (i.e., student) network during training. Compared to relevant baselines, our model allows the team to reduce communication by up to two orders of magnitude, while only sacrificing a marginal 2.4\% in total travel distance, paving the way for efficient, distributed multi-robot exploration in bandwidth-limited scenarios.

Privileged Reinforcement and Communication Learning for Distributed, Bandwidth-limited Multi-robot Exploration

TL;DR

This paper addresses bandwidth-limited multi-robot exploration by learning to share fixed-size messages that encode salient information from each robot's partial map. It introduces a graph-attention based policy network and a privileged critic within a soft actor-critic framework to train in a distributed, communication-constrained setting. The results show up to 99.2% reduction in communication with only a 2.4% increase in total travel distance, and that allowing full map sharing further improves exploration efficiency, illustrating strong scalability to larger teams. Overall, the approach enables cooperative, efficient exploration under tight bandwidth constraints and points to practical pathways for real-world deployment and further bandwidth reductions.

Abstract

Communication bandwidth is an important consideration in multi-robot exploration, where information exchange among robots is critical. While existing methods typically aim to reduce communication throughput, they either require significant computation or significantly compromise exploration efficiency. In this work, we propose a deep reinforcement learning framework based on communication and privileged reinforcement learning to achieve a significant reduction in bandwidth consumption, while minimally sacrificing exploration efficiency. Specifically, our approach allows robots to learn to embed the most salient information from their individual belief (partial map) over the environment into fixed-sized messages. Robots then reason about their own belief as well as received messages to distributedly explore the environment while avoiding redundant work. In doing so, we employ privileged learning and learned attention mechanisms to endow the critic (i.e., teacher) network with ground truth map knowledge to effectively guide the policy (i.e., student) network during training. Compared to relevant baselines, our model allows the team to reduce communication by up to two orders of magnitude, while only sacrificing a marginal 2.4\% in total travel distance, paving the way for efficient, distributed multi-robot exploration in bandwidth-limited scenarios.
Paper Structure (23 sections, 3 equations, 5 figures, 1 table, 1 algorithm)

This paper contains 23 sections, 3 equations, 5 figures, 1 table, 1 algorithm.

Figures (5)

  • Figure 1: Example application of our approach to a multi-robot exploration task in a bandwidth-constrained environment (here, underwater).
  • Figure 2: Architecture of our policy and critic networks.
  • Figure 3: Observation structure for our policy and critic networks. The left part is the ground truth observation used in our critic network, while the four plots on the right respectively represent the partial observations used as input to the robots' policy networks. The large dots in red, blue, yellow, and green represent the current position of the robots, while the smaller nodes' varying colors denote varying exploration utility. We omit the depiction of the collision-free graph edges for visualization purposes.
  • Figure 4: The ablation of Privileged Learning. Both models have been trained for 20,000 episodes to converge. A lower travel distance signifies better model performance.
  • Figure 5: examples maps from our testing set.