Table of Contents
Fetching ...

IR2: Implicit Rendezvous for Robotic Exploration Teams under Sparse Intermittent Connectivity

Derek Ming Siang Tan, Yixiao Ma, Jingsong Liang, Yi Cheng Chng, Yuhong Cao, Guillaume Sartoretti

TL;DR

IR2 addresses the challenge of information sharing in multi-robot exploration under sparse, intermittent connectivity by learning non-myopic rendezvous decisions with an attention-based DRL policy. It introduces a hierarchical graph formulation that balances long- and short-term goals, enabling scalable planning in large environments. The method is trained with curriculum learning and evaluated against state-of-the-art baselines, showing significant improvements in distance efficiency and map-sharing fairness, and it validates transfer to real hardware. The work offers practical impact for efficient, scalable multi-robot exploration with realistic communication constraints and lays groundwork for extensions to latency, packet loss, and 3D environments.

Abstract

Information sharing is critical in time-sensitive and realistic multi-robot exploration, especially for smaller robotic teams in large-scale environments where connectivity may be sparse and intermittent. Existing methods often overlook such communication constraints by assuming unrealistic global connectivity. Other works account for communication constraints (by maintaining close proximity or line of sight during information exchange), but are often inefficient. For instance, preplanned rendezvous approaches typically involve unnecessary detours resulting from poorly timed rendezvous, while pursuit-based approaches often result in short-sighted decisions due to their greedy nature. We present IR2, a deep reinforcement learning approach to information sharing for multi-robot exploration. Leveraging attention-based neural networks trained via reinforcement and curriculum learning, IR2 allows robots to effectively reason about the longer-term trade-offs between disconnecting for solo exploration and reconnecting for information sharing. In addition, we propose a hierarchical graph formulation to maintain a sparse yet informative graph, enabling our approach to scale to large-scale environments. We present simulation results in three large-scale Gazebo environments, which show that our approach yields 6.6-34.1% shorter exploration paths when compared to state-of-the-art baselines, and lastly deploy our learned policy on hardware. Our simulation training and testing code is available at https://ir2-explore.github.io.

IR2: Implicit Rendezvous for Robotic Exploration Teams under Sparse Intermittent Connectivity

TL;DR

IR2 addresses the challenge of information sharing in multi-robot exploration under sparse, intermittent connectivity by learning non-myopic rendezvous decisions with an attention-based DRL policy. It introduces a hierarchical graph formulation that balances long- and short-term goals, enabling scalable planning in large environments. The method is trained with curriculum learning and evaluated against state-of-the-art baselines, showing significant improvements in distance efficiency and map-sharing fairness, and it validates transfer to real hardware. The work offers practical impact for efficient, scalable multi-robot exploration with realistic communication constraints and lays groundwork for extensions to latency, packet loss, and 3D environments.

Abstract

Information sharing is critical in time-sensitive and realistic multi-robot exploration, especially for smaller robotic teams in large-scale environments where connectivity may be sparse and intermittent. Existing methods often overlook such communication constraints by assuming unrealistic global connectivity. Other works account for communication constraints (by maintaining close proximity or line of sight during information exchange), but are often inefficient. For instance, preplanned rendezvous approaches typically involve unnecessary detours resulting from poorly timed rendezvous, while pursuit-based approaches often result in short-sighted decisions due to their greedy nature. We present IR2, a deep reinforcement learning approach to information sharing for multi-robot exploration. Leveraging attention-based neural networks trained via reinforcement and curriculum learning, IR2 allows robots to effectively reason about the longer-term trade-offs between disconnecting for solo exploration and reconnecting for information sharing. In addition, we propose a hierarchical graph formulation to maintain a sparse yet informative graph, enabling our approach to scale to large-scale environments. We present simulation results in three large-scale Gazebo environments, which show that our approach yields 6.6-34.1% shorter exploration paths when compared to state-of-the-art baselines, and lastly deploy our learned policy on hardware. Our simulation training and testing code is available at https://ir2-explore.github.io.
Paper Structure (37 sections, 4 equations, 8 figures, 3 tables)

This paper contains 37 sections, 4 equations, 8 figures, 3 tables.

Figures (8)

  • Figure 1: Search-and-rescue robots collaborate to navigate uncharted disaster zones under realistic communication constraints. For example, the yellow robot must balance trade-offs between disconnecting from its team (blue robots) to independently explore (green arrows), and pursuing other robots to exchange information (red arrows).
  • Figure 2: Hierarchical Graph Formulation Four-stage process illustrated with snapshots from different episodes: (a) Dense local graph construction around robot's position. (b) Sparse global graph construction via offshoots toward frontiers. (c) Global graph merger combining different robots' global graphs (different colored nodes). The map-surplus utility paths ($s_{i,j}$) between robots are shown as paths with colors of increasing intensity (black to yellow). (d) Global graph pruning to remove nodes that do not lead to frontiers centers (purple circles).
  • Figure 3: DRL-Based Planner Architecture. In a multi-robot setting, for any robot, our approach first merges and sparsifies the global graphs shared by robots within connectivity range. Thereafter, we augment the merged graph with additional information pertinent to exploration and rendezvous. This augmented graph is fed into a similar encoder-decoder attention-based neural network architecture as ariadne_explore.
  • Figure 4: Corridor, Hybrid, and Complex maps (left to right).
  • Figure 5: Path visualization of competing approaches for three robots in Indoormtare_planner, where there is no opportunistic connectivity across walls. We observe $IR^2$ exhibits the least trajectory overlap and backtracking, followed by Pursuitmtare_planner, and finally Preplannedlauren_explore.
  • ...and 3 more figures