Hilbert-Augmented Reinforcement Learning for Scalable Multi-Robot Coverage and Exploration

Tamil Selvan Gurunathan; Aryya Gangopadhyay

Hilbert-Augmented Reinforcement Learning for Scalable Multi-Robot Coverage and Exploration

Tamil Selvan Gurunathan, Aryya Gangopadhyay

TL;DR

Results indicate that geometric priors improve autonomy and scalability for swarm and legged robotics.

Abstract

We present a coverage framework that integrates Hilbert space-filling priors into decentralized multi-robot learning and execution. We augment DQN and PPO with Hilbert-based spatial indices to structure exploration and reduce redundancy in sparse-reward environments, and we evaluate scalability in multi-robot grid coverage. We further describe a waypoint interface that converts Hilbert orderings into curvature-bounded, time-parameterized SE(2) trajectories (planar (x, y, θ)), enabling onboard feasibility on resource-constrained robots. Experiments show improvements in coverage efficiency, redundancy, and convergence speed over DQN/PPO baselines. In addition, we validate the approach on a Boston Dynamics Spot legged robot, executing the generated trajectories in indoor environments and observing reliable coverage with low redundancy. These results indicate that geometric priors improve autonomy and scalability for swarm and legged robotics.

Hilbert-Augmented Reinforcement Learning for Scalable Multi-Robot Coverage and Exploration

TL;DR

Results indicate that geometric priors improve autonomy and scalability for swarm and legged robotics.

Abstract

Paper Structure (21 sections, 4 equations, 8 figures, 2 tables, 2 algorithms)

This paper contains 21 sections, 4 equations, 8 figures, 2 tables, 2 algorithms.

Introduction
Related Work
Preliminaries
Methodology
Hilbert-Augmented DQN
Hilbert-Augmented PPO
State Augmentation and Reward Shaping
Implementation on Spot (Waypoint Interface)
Frames and safety.
Latency and communications.
Experiments
Experiments with Physical Robots
Trial protocol.
Parameter sensitivity.
Failure modes.
...and 6 more sections

Figures (8)

Figure 1: Flowcharts illustrating the integration of Hilbert index into the H-DQN (left) and H-PPO (right) architectures.
Figure 2: Real-robot execution of the PPO policy on a Boston Dynamics Spot in a $10\,\mathrm{m}\times10\,\mathrm{m}$ area (5$\times$5 grid). Policy-generated waypoints (markers) are converted to discrete $\mathrm{SE}(2)$ commands (linear steps and $\pm 30^{\circ}$ turns) via the Spot SDK; the overlaid trace shows onboard odometry during execution. PPO covers all cells but exhibits additional revisits and a longer traversal compared to the Hilbert-guided variant in Fig. 3.
Figure 3: Real-robot execution of the H-PPO policy under the same conditions as Fig. \ref{['ppo']}. Hilbert-index augmentation yields a locality-preserving sweep with fewer revisits and shorter coverage time. Planned waypoints align closely with the executed odometry, indicating high path fidelity at medium walking speed.
Figure 4: Average cumulative reward across different agent counts for DQN, H-DQN, PPO, and H-PPO. Hilbert-augmented agents consistently achieve higher reward, particularly as the team size increases.
Figure 5: Comparison between H-DQN and standard DQN across cumulative reward, coverage ratio, and redundancy. Hilbert-DQN shows superior scalability and efficiency, especially at higher agent counts.
...and 3 more figures

Hilbert-Augmented Reinforcement Learning for Scalable Multi-Robot Coverage and Exploration

TL;DR

Abstract

Hilbert-Augmented Reinforcement Learning for Scalable Multi-Robot Coverage and Exploration

Authors

TL;DR

Abstract

Table of Contents

Figures (8)