Vox-Fusion++: Voxel-based Neural Implicit Dense Tracking and Mapping with Multi-maps
Hongjia Zhai, Hai Li, Xingrui Yang, Gan Huang, Yuhang Ming, Hujun Bao, Guofeng Zhang
TL;DR
Vox-Fusion++ tackles robust, real-time dense SLAM by unifying voxel-based neural implicit surfaces with traditional volumetric fusion in a dynamic octree. It leverages sparse voxel embeddings, an on-the-fly SDF decoder $F_{\theta}$, and differentiable rendering to achieve accurate geometry and color, while adopting a multi-map framework with loop closure and hierarchical pose optimization to scale to large scenes. Key contributions include dynamic voxel expansion without scene bounds, a multi-map incremental mapping strategy, appearance-and-geometry loop detection, and inter/intra-map optimization that reduce drift and duplicate geometry, all with favorable time and memory characteristics. The approach enables AR occlusion handling and collaborative mapping across multiple agents, demonstrating strong reconstruction quality and efficiency on benchmarks and large real-world scenes.
Abstract
In this paper, we introduce Vox-Fusion++, a multi-maps-based robust dense tracking and mapping system that seamlessly fuses neural implicit representations with traditional volumetric fusion techniques. Building upon the concept of implicit mapping and positioning systems, our approach extends its applicability to real-world scenarios. Our system employs a voxel-based neural implicit surface representation, enabling efficient encoding and optimization of the scene within each voxel. To handle diverse environments without prior knowledge, we incorporate an octree-based structure for scene division and dynamic expansion. To achieve real-time performance, we propose a high-performance multi-process framework. This ensures the system's suitability for applications with stringent time constraints. Additionally, we adopt the idea of multi-maps to handle large-scale scenes, and leverage loop detection and hierarchical pose optimization strategies to reduce long-term pose drift and remove duplicate geometry. Through comprehensive evaluations, we demonstrate that our method outperforms previous methods in terms of reconstruction quality and accuracy across various scenarios. We also show that our Vox-Fusion++ can be used in augmented reality and collaborative mapping applications. Our source code will be publicly available at \url{https://github.com/zju3dv/Vox-Fusion_Plus_Plus}
