Deep Bayesian Future Fusion for Self-Supervised, High-Resolution, Off-Road Mapping
Shubhra Aich, Wenshan Wang, Parv Maheshwari, Matthew Sivaprakasam, Samuel Triest, Cherie Ho, Jason M. Gregory, John G. Rogers, Sebastian Scherer
TL;DR
This work tackles the challenge of high-resolution off-road mapping under long-range sparsity and sensing noise by introducing Deep Bayesian Future Fusion (DBFF), a dense BEV map completion framework that operates at 2 cm resolution over a 30 m forward view. It marries a Bayes-filter-inspired fusion mechanism with a CNN/RNN backbone and perceptual generative losses to predict dense RGB and height maps from sparse measurements, while using a proximal-distal latent split and cross-attention-based measurement updates to maintain geometric consistency. The authors fabricate a self-supervised training regime, called Future Fusion, to generate large-scale dense ground-truth BEV maps from stereo, RGB, and LiDAR data, and demonstrate improvements over baselines in both direct map quality (MAE, FID, SSIM) and downstream costmap prediction. The approach achieves real-time performance (~16 Hz) and shows that learned features from the completed maps carry meaningful terrain information, suggesting significant potential for robust, pretrainable dense mapping in autonomous off-road navigation.
Abstract
High-speed off-road navigation requires long-range, high-resolution maps to enable robots to safely navigate over different surfaces while avoiding dangerous obstacles. However, due to limited computational power and sensing noise, most approaches to off-road mapping focus on producing coarse (20-40cm) maps of the environment. In this paper, we propose Future Fusion, a framework capable of generating dense, high-resolution maps from sparse sensing data (30m forward at 2cm). This is accomplished by - (1) the efficient realization of the well-known Bayes filtering within the standard deep learning models that explicitly accounts for the sparsity pattern in stereo and LiDAR depth data, and (2) leveraging perceptual losses common in generative image completion. The proposed methodology outperforms the conventional baselines. Moreover, the learned features and the completed dense maps lead to improvements in the downstream navigation task.
