IR2: Implicit Rendezvous for Robotic Exploration Teams under Sparse Intermittent Connectivity
Derek Ming Siang Tan, Yixiao Ma, Jingsong Liang, Yi Cheng Chng, Yuhong Cao, Guillaume Sartoretti
TL;DR
IR2 addresses the challenge of information sharing in multi-robot exploration under sparse, intermittent connectivity by learning non-myopic rendezvous decisions with an attention-based DRL policy. It introduces a hierarchical graph formulation that balances long- and short-term goals, enabling scalable planning in large environments. The method is trained with curriculum learning and evaluated against state-of-the-art baselines, showing significant improvements in distance efficiency and map-sharing fairness, and it validates transfer to real hardware. The work offers practical impact for efficient, scalable multi-robot exploration with realistic communication constraints and lays groundwork for extensions to latency, packet loss, and 3D environments.
Abstract
Information sharing is critical in time-sensitive and realistic multi-robot exploration, especially for smaller robotic teams in large-scale environments where connectivity may be sparse and intermittent. Existing methods often overlook such communication constraints by assuming unrealistic global connectivity. Other works account for communication constraints (by maintaining close proximity or line of sight during information exchange), but are often inefficient. For instance, preplanned rendezvous approaches typically involve unnecessary detours resulting from poorly timed rendezvous, while pursuit-based approaches often result in short-sighted decisions due to their greedy nature. We present IR2, a deep reinforcement learning approach to information sharing for multi-robot exploration. Leveraging attention-based neural networks trained via reinforcement and curriculum learning, IR2 allows robots to effectively reason about the longer-term trade-offs between disconnecting for solo exploration and reconnecting for information sharing. In addition, we propose a hierarchical graph formulation to maintain a sparse yet informative graph, enabling our approach to scale to large-scale environments. We present simulation results in three large-scale Gazebo environments, which show that our approach yields 6.6-34.1% shorter exploration paths when compared to state-of-the-art baselines, and lastly deploy our learned policy on hardware. Our simulation training and testing code is available at https://ir2-explore.github.io.
