Table of Contents
Fetching ...

CQLite: Communication-Efficient Multi-Robot Exploration Using Coverage-biased Distributed Q-Learning

Ehsan Latif, Ramviyas Parasuraman

TL;DR

The paper tackles the scalability challenge in multi-robot exploration by introducing CQLite, a distributed Q-learning framework that minimizes inter-robot communication through selective sharing of updated $Q$-values and ad-hoc map merging. It presents a coverage-biased reward structure and a Voronoi-based task partitioning strategy to coordinate exploration with limited data exchange, accompanied by theoretical convergence guarantees and time-complexity analysis. Extensive simulations in ROS/Gazebo and real-world Turtlebot3 experiments show that CQLite achieves faster convergence, larger coverage, and dramatically reduced communication and computation compared to RRT and DRL baselines. The work demonstrates practical benefits for resource-constrained, cooperative robotics applications and provides open-source ROS tooling for broader adoption.

Abstract

Frontier exploration and reinforcement learning have historically been used to solve the problem of enabling many mobile robots to autonomously and cooperatively explore complex surroundings. These methods need to keep an internal global map for navigation, but they do not take into consideration the high costs of communication and information sharing between robots. This study offers CQLite, a novel distributed Q-learning technique designed to minimize data communication overhead between robots while achieving rapid convergence and thorough coverage in multi-robot exploration. The proposed CQLite method uses ad hoc map merging, and selectively shares updated Q-values at recently identified frontiers to significantly reduce communication costs. The theoretical analysis of CQLite's convergence and efficiency, together with extensive numerical verification on simulated indoor maps utilizing several robots, demonstrates the method's novelty. With over 2x reductions in computation and communication alongside improved mapping performance, CQLite outperformed cutting-edge multi-robot exploration techniques like Rapidly Exploring Random Trees and Deep Reinforcement Learning. Related codes are open-sourced at \url{https://github.com/herolab-uga/cqlite}.

CQLite: Communication-Efficient Multi-Robot Exploration Using Coverage-biased Distributed Q-Learning

TL;DR

The paper tackles the scalability challenge in multi-robot exploration by introducing CQLite, a distributed Q-learning framework that minimizes inter-robot communication through selective sharing of updated -values and ad-hoc map merging. It presents a coverage-biased reward structure and a Voronoi-based task partitioning strategy to coordinate exploration with limited data exchange, accompanied by theoretical convergence guarantees and time-complexity analysis. Extensive simulations in ROS/Gazebo and real-world Turtlebot3 experiments show that CQLite achieves faster convergence, larger coverage, and dramatically reduced communication and computation compared to RRT and DRL baselines. The work demonstrates practical benefits for resource-constrained, cooperative robotics applications and provides open-source ROS tooling for broader adoption.

Abstract

Frontier exploration and reinforcement learning have historically been used to solve the problem of enabling many mobile robots to autonomously and cooperatively explore complex surroundings. These methods need to keep an internal global map for navigation, but they do not take into consideration the high costs of communication and information sharing between robots. This study offers CQLite, a novel distributed Q-learning technique designed to minimize data communication overhead between robots while achieving rapid convergence and thorough coverage in multi-robot exploration. The proposed CQLite method uses ad hoc map merging, and selectively shares updated Q-values at recently identified frontiers to significantly reduce communication costs. The theoretical analysis of CQLite's convergence and efficiency, together with extensive numerical verification on simulated indoor maps utilizing several robots, demonstrates the method's novelty. With over 2x reductions in computation and communication alongside improved mapping performance, CQLite outperformed cutting-edge multi-robot exploration techniques like Rapidly Exploring Random Trees and Deep Reinforcement Learning. Related codes are open-sourced at \url{https://github.com/herolab-uga/cqlite}.
Paper Structure (15 sections, 23 equations, 7 figures, 1 table, 1 algorithm)

This paper contains 15 sections, 23 equations, 7 figures, 1 table, 1 algorithm.

Figures (7)

  • Figure 1: Overview of the distributed CQLite method for efficient multi-robot exploration, shown with an illustrative simulation.
  • Figure 2: System architecture of CQLite distributed across several robots. It shows the Robot $i$'s process showing the mapping, frontier detection, and Q-learning operations along with the communication of local map and updated Q-value information to $n$ connected robots.
  • Figure 3: A depiction of the outcome in a sample trial. It shows the map generated by three robots in the house world (left column) and three and six robots in the bookstore world (canter and right column, respectively) created by the three compared approaches; RRT (top), DRL (center), and CQLite (bottom), with robots moving in a simulated House and Bookstore worlds along with the following trajectories, start and end locations.
  • Figure 4: Computation (Left), Communication (Center) cost, and Exploration over time (Right) comparison plot of CQLite with RRT and DRL approach in three Gazebo simulated world. Row-wise: Top 3 robots in house world, Middle 3 robots in bookstore world, and Bottom 6 robots in bookstore world.
  • Figure 5: Exploration map of the House world before, during, and after map sharing and merging corresponds to points (A, B, and C from Fig. \ref{['fig:comparison']}) in Computation and Communication plots. Peaks demonstrate the request for map merging in CQLite for Computation and communication plots; RRT runs longer with persistent high communication and computational overhead but explores fewer regions than DRL and CQLite.
  • ...and 2 more figures

Theorems & Definitions (3)

  • proof
  • proof
  • proof