Table of Contents
Fetching ...

NeuV-SLAM: Fast Neural Multiresolution Voxel Optimization for RGBD Dense SLAM

Wenzhi Guo, Bing Wang, Lijun Chen

TL;DR

NeuV-SLAM addresses the scalability and speed bottlenecks of neural implicit dense SLAM by introducing a hash-based multiresolution voxel system (hashMV) and a VDF implicit representation that anchors color and SDF features directly in voxels. It employs a lightweight decoder and volume rendering to jointly optimize geometry and appearance, with SDF values activated by a tanh function to sharpen surfaces. Across Replica and ScanNet, NeuV-SLAM achieves faster convergence, higher tracking accuracy, improved reconstruction fidelity, and superior rendering quality compared to strong baselines, while maintaining a scalable memory footprint through multiresolution voxels. The work advances real-time, incrementally expandable neural SLAM, with clear implications for robotics, autonomous navigation, and augmented reality.

Abstract

We introduce NeuV-SLAM, a novel dense simultaneous localization and mapping pipeline based on neural multiresolution voxels, characterized by ultra-fast convergence and incremental expansion capabilities. This pipeline utilizes RGBD images as input to construct multiresolution neural voxels, achieving rapid convergence while maintaining robust incremental scene reconstruction and camera tracking. Central to our methodology is to propose a novel implicit representation, termed VDF that combines the implementation of neural signed distance field (SDF) voxels with an SDF activation strategy. This approach entails the direct optimization of color features and SDF values anchored within the voxels, substantially enhancing the rate of scene convergence. To ensure the acquisition of clear edge delineation, SDF activation is designed, which maintains exemplary scene representation fidelity even under constraints of voxel resolution. Furthermore, in pursuit of advancing rapid incremental expansion with low computational overhead, we developed hashMV, a novel hash-based multiresolution voxel management structure. This architecture is complemented by a strategically designed voxel generation technique that synergizes with a two-dimensional scene prior. Our empirical evaluations, conducted on the Replica and ScanNet Datasets, substantiate NeuV-SLAM's exceptional efficacy in terms of convergence speed, tracking accuracy, scene reconstruction, and rendering quality.

NeuV-SLAM: Fast Neural Multiresolution Voxel Optimization for RGBD Dense SLAM

TL;DR

NeuV-SLAM addresses the scalability and speed bottlenecks of neural implicit dense SLAM by introducing a hash-based multiresolution voxel system (hashMV) and a VDF implicit representation that anchors color and SDF features directly in voxels. It employs a lightweight decoder and volume rendering to jointly optimize geometry and appearance, with SDF values activated by a tanh function to sharpen surfaces. Across Replica and ScanNet, NeuV-SLAM achieves faster convergence, higher tracking accuracy, improved reconstruction fidelity, and superior rendering quality compared to strong baselines, while maintaining a scalable memory footprint through multiresolution voxels. The work advances real-time, incrementally expandable neural SLAM, with clear implications for robotics, autonomous navigation, and augmented reality.

Abstract

We introduce NeuV-SLAM, a novel dense simultaneous localization and mapping pipeline based on neural multiresolution voxels, characterized by ultra-fast convergence and incremental expansion capabilities. This pipeline utilizes RGBD images as input to construct multiresolution neural voxels, achieving rapid convergence while maintaining robust incremental scene reconstruction and camera tracking. Central to our methodology is to propose a novel implicit representation, termed VDF that combines the implementation of neural signed distance field (SDF) voxels with an SDF activation strategy. This approach entails the direct optimization of color features and SDF values anchored within the voxels, substantially enhancing the rate of scene convergence. To ensure the acquisition of clear edge delineation, SDF activation is designed, which maintains exemplary scene representation fidelity even under constraints of voxel resolution. Furthermore, in pursuit of advancing rapid incremental expansion with low computational overhead, we developed hashMV, a novel hash-based multiresolution voxel management structure. This architecture is complemented by a strategically designed voxel generation technique that synergizes with a two-dimensional scene prior. Our empirical evaluations, conducted on the Replica and ScanNet Datasets, substantiate NeuV-SLAM's exceptional efficacy in terms of convergence speed, tracking accuracy, scene reconstruction, and rendering quality.
Paper Structure (34 sections, 10 equations, 9 figures, 11 tables)

This paper contains 34 sections, 10 equations, 9 figures, 11 tables.

Figures (9)

  • Figure 1: We propose NeuV-SLAM, a SLAM system that incrementally reconstructs scenes from sequential RGBD frames. NeuV-SLAM reconstructs the scene separately based on dense voxels and sparse voxels.
  • Figure 2: Overview of NeuV-SLAM. NeuV-SLAM takes RGBD images as input and directly anchors color features and SDF values in multiresolution voxels to estimate camera pose and learn scene representation. From left to right, during the mapping stage, SDF values obtained directly through activated trilinear interpolation efficiently learn scene geometry, and scene color information is learned through interpolated neural features. The depth and color values are rendered through volumetric rendering, minimizing color, depth, and SDF losses to optimize the network $G_g$. From right to left, during the tracking stage, $G_g$ parameters are fixed, and the camera pose is updated through backward propagation. Incremental expansion of the scene is achieved through the hashMV structure. The tracking and mapping stages alternate until the entire SLAM process is completed, with the multiresolution voxels converging to a finite set.
  • Figure 3: The process of multiresolution voxel generation.Edge Detection: This step entails identifying boundaries within an image by pinpointing areas of significant brightness discontinuities. Sort: The detected edge points are then prioritized and resequenced, with a focus on these points for subsequent processing. Key Calculation: Utilizing the positional data of each point, a unique identifier or 'key' is computed. Lookup: This key is used to search within a hash table. Absent keys prompt new key generation. Insert: Voxel creation is guided by these keys. Edge points lead to denser voxel formation, while non-edge points result in sparser voxels. Existing occupied spaces result in key disposal.
  • Figure 4: The lightweight network architecture of $G_g$.
  • Figure 5: Qualitative tracking result on the Replica and ScanNet Datasets. We project the trajectory in three-dimensional space onto the x-y plane.
  • ...and 4 more figures