VORL-EXPLORE: A Hybrid Learning Planning Approach to Multi-Robot Exploration in Dynamic Environments

Ning Liu; Sen Shen; Zheng Li; Sheng Liu; Dongkun Han; Shangke Lyu; Thomas Braunl

VORL-EXPLORE: A Hybrid Learning Planning Approach to Multi-Robot Exploration in Dynamic Environments

Ning Liu, Sen Shen, Zheng Li, Sheng Liu, Dongkun Han, Shangke Lyu, Thomas Braunl

TL;DR

VORL-EXPLORE is proposed, a hybrid learning and planning framework that addresses limitation through execution fidelity, a shared estimate of local navigability that couples task allocation with motion execution and drives a risk-aware adaptive arbitration mechanism between global A* guidance and a reactive reinforcement learning policy.

Abstract

Hierarchical multi-robot exploration commonly decouples frontier allocation from local navigation, which can make the system brittle in dense and dynamic environments. Because the allocator lacks direct awareness of execution difficulty, robots may cluster at bottlenecks, trigger oscillatory replanning, and generate redundant coverage. We propose VORL-EXPLORE, a hybrid learning and planning framework that addresses this limitation through execution fidelity, a shared estimate of local navigability that couples task allocation with motion execution. This fidelity signal is incorporated into a fidelity-coupled Voronoi objective with inter-robot repulsion to reduce contention before it emerges. It also drives a risk-aware adaptive arbitration mechanism between global A* guidance and a reactive reinforcement learning policy, balancing long-range efficiency with safe interaction in confined spaces. The framework further supports online self-supervised recalibration of the fidelity model using pseudo-labels derived from recent progress and safety outcomes, enabling adaptation to non-stationary obstacles without manual risk tuning. We evaluate this capability separately in a dedicated severe-traffic ablation. Extensive experiments in randomized grids and a Gazebo factory scenario show high success rates, shorter path length, lower overlap, and robust collision avoidance. The source code will be made publicly available upon acceptance.

VORL-EXPLORE: A Hybrid Learning Planning Approach to Multi-Robot Exploration in Dynamic Environments

TL;DR

Abstract

Paper Structure (26 sections, 15 equations, 5 figures, 2 tables, 1 algorithm)

This paper contains 26 sections, 15 equations, 5 figures, 2 tables, 1 algorithm.

Introduction
Related Work
Target Assignment and Regional Partitioning
Path Planning and Motion Execution
Connecting Assignment and Execution
Online Adaptation in Exploration
Problem Definition
Environment Model
Objective
Methodology
Closed-loop architecture
Coupled frontier assignment
Motion arbitration with a learnable switch
Online self-supervised adaptation
Experiments
...and 11 more sections

Figures (5)

Figure 1: Canonical Voronoi-based frontier assignment in multi-robot exploration. Three robots share an occupancy map, extract frontiers, and form a Voronoi partition $\{V_{i,t}\}$ induced by BFS distance. Each robot selects a frontier within its region as the next exploration goal.
Figure 2: Closed-loop architecture of VORL-EXPLORE. Each robot estimates execution fidelity $p_{i,t}$ online from local cues and shared teammate states. In the task layer, $p_{i,t}$ modulates frontier-scoring weights to reduce assignments that are likely to cause congestion. In the motion layer, $p_{i,t}$ drives a hysteresis gate that selects between $A^*$ guidance and a reactive RL policy. Progress and safety outcomes generate pseudo-labels to update the fidelity estimator online, closing the coupling loop.
Figure 3: Allocator-level comparison with target allocation baselines on an $80\times80$ grid with 16 dynamic obstacles and 30% static obstacle density. All methods share the same VORL-Explore execution layer. Curves show the mean over 100 runs and shaded regions indicate 95% confidence intervals.
Figure 4: Progressive ablation of the coupled architecture. Base disables both coupling links and uses decoupled target assignment and motion execution. CA enables fidelity-coupled assignment by using execution fidelity to reweight frontier scoring, while keeping the execution module unchanged. CP enables fidelity-gated switching in the execution layer, while keeping the assignment objective unchanged. Full enables both CA and CP. Bars show the mean and error bars indicate $\pm$STD over 100 runs with 4 agents on a $40\times40$ grid, 30% static obstacle density, and 32 dynamic obstacles.
Figure 5: Gazebo validation with four Pioneer3 robots in a dynamic factory environment. The normalized new coverage curve shows faster exploration than ROS explore_lite.

VORL-EXPLORE: A Hybrid Learning Planning Approach to Multi-Robot Exploration in Dynamic Environments

TL;DR

Abstract

VORL-EXPLORE: A Hybrid Learning Planning Approach to Multi-Robot Exploration in Dynamic Environments

Authors

TL;DR

Abstract

Table of Contents

Figures (5)