Table of Contents
Fetching ...

Sim2Dust: Mastering Dynamic Waypoint Tracking on Granular Media

Andrej Orsula, Matthieu Geist, Miguel Olivares-Mendez, Carol Martinez

TL;DR

This work tackles the sim-to-real gap in rover navigation on granular regolith by building a complete framework that trains policies in massively parallel, procedurally varied simulations (Space Robotics Bench) and transfers them zero-shot to a lunar-analogue rover (Leo) in LunaLab. The authors compare RL algorithms and action smoothing, showing DreamerV3 achieves superior zero-shot performance and sample efficiency, with PCG-driven diversity being crucial for generalization. They also show that high-fidelity particle tuning offers limited gains at a high computational cost, and that perceptual gaps in vision-based control pose a significant challenge. The results establish a practical workflow for robust, learning-based autonomous traversal in off-world environments, while identifying key limitations and avenues for future sensor- and perception-focused improvements.

Abstract

Reliable autonomous navigation across the unstructured terrains of distant planetary surfaces is a critical enabler for future space exploration. However, the deployment of learning-based controllers is hindered by the inherent sim-to-real gap, particularly for the complex dynamics of wheel interactions with granular media. This work presents a complete sim-to-real framework for developing and validating robust control policies for dynamic waypoint tracking on such challenging surfaces. We leverage massively parallel simulation to train reinforcement learning agents across a vast distribution of procedurally generated environments with randomized physics. These policies are then transferred zero-shot to a physical wheeled rover operating in a lunar-analogue facility. Our experiments systematically compare multiple reinforcement learning algorithms and action smoothing filters to identify the most effective combinations for real-world deployment. Crucially, we provide strong empirical evidence that agents trained with procedural diversity achieve superior zero-shot performance compared to those trained on static scenarios. We also analyze the trade-offs of fine-tuning with high-fidelity particle physics, which offers minor gains in low-speed precision at a significant computational cost. Together, these contributions establish a validated workflow for creating reliable learning-based navigation systems, marking a substantial step towards deploying autonomous robots in the final frontier.

Sim2Dust: Mastering Dynamic Waypoint Tracking on Granular Media

TL;DR

This work tackles the sim-to-real gap in rover navigation on granular regolith by building a complete framework that trains policies in massively parallel, procedurally varied simulations (Space Robotics Bench) and transfers them zero-shot to a lunar-analogue rover (Leo) in LunaLab. The authors compare RL algorithms and action smoothing, showing DreamerV3 achieves superior zero-shot performance and sample efficiency, with PCG-driven diversity being crucial for generalization. They also show that high-fidelity particle tuning offers limited gains at a high computational cost, and that perceptual gaps in vision-based control pose a significant challenge. The results establish a practical workflow for robust, learning-based autonomous traversal in off-world environments, while identifying key limitations and avenues for future sensor- and perception-focused improvements.

Abstract

Reliable autonomous navigation across the unstructured terrains of distant planetary surfaces is a critical enabler for future space exploration. However, the deployment of learning-based controllers is hindered by the inherent sim-to-real gap, particularly for the complex dynamics of wheel interactions with granular media. This work presents a complete sim-to-real framework for developing and validating robust control policies for dynamic waypoint tracking on such challenging surfaces. We leverage massively parallel simulation to train reinforcement learning agents across a vast distribution of procedurally generated environments with randomized physics. These policies are then transferred zero-shot to a physical wheeled rover operating in a lunar-analogue facility. Our experiments systematically compare multiple reinforcement learning algorithms and action smoothing filters to identify the most effective combinations for real-world deployment. Crucially, we provide strong empirical evidence that agents trained with procedural diversity achieve superior zero-shot performance compared to those trained on static scenarios. We also analyze the trade-offs of fine-tuning with high-fidelity particle physics, which offers minor gains in low-speed precision at a significant computational cost. Together, these contributions establish a validated workflow for creating reliable learning-based navigation systems, marking a substantial step towards deploying autonomous robots in the final frontier.

Paper Structure

This paper contains 16 sections, 9 figures, 3 tables.

Figures (9)

  • Figure 1: Agents are trained to track dynamic waypoints in procedurally generated scenarios of the Space Robotics Bench. The generalization learned from diverse experience enables the acquired policies to be transferred zero-shot to a physical rover on granular media in a lunar-analogue facility.
  • Figure 2: Visualization of our high-fidelity simulation environment. A procedurally generated terrain mesh is populated with millions of discrete particles that enable a more realistic simulation of complex wheel-regolith interaction dynamics.
  • Figure 3: SRB supports massively parallel simulation in two primary regimes. In the stacked regime, all environment instances are superimposed and share a single static terrain, which risks policy overfitting. In contrast, the procedural regime exposes each instance to a unique procedurally generated terrain to foster robustness and generalization. Blue arrows indicate the dynamically evolving target waypoints.
  • Figure 4: Real-world validation is performed with a Leo Rover inside the LunaLab facility ludivig2020building, which serves as a lunar-analogue testbed filled with basalt gravel. It is equipped with a Sun emulator and a motion capture system for ground-truth state estimation during policy execution and evaluation.
  • Figure 5: Learning curves of RL algorithms during the training in SRB simulation, averaged over five random seeds, with shaded regions representing the standard deviation.
  • ...and 4 more figures