Table of Contents
Fetching ...

MM-Gaussian: 3D Gaussian-based Multi-modal Fusion for Localization and Reconstruction in Unbounded Scenes

Chenyang Wu, Yifan Duan, Xinran Zhang, Yu Sheng, Jianmin Ji, Yanyong Zhang

TL;DR

MM-Gaussian, a LiDAR-camera multimodal fusion system for localization and mapping in unbounded scenes, inspired by the recently developed 3D Gaussians, fully utilizes the geometric structure information provided by solid-state LiDAR to address the problem of inaccurate depth encountered when relying solely on visual solutions in unbounded, outdoor scenarios.

Abstract

Localization and mapping are critical tasks for various applications such as autonomous vehicles and robotics. The challenges posed by outdoor environments present particular complexities due to their unbounded characteristics. In this work, we present MM-Gaussian, a LiDAR-camera multi-modal fusion system for localization and mapping in unbounded scenes. Our approach is inspired by the recently developed 3D Gaussians, which demonstrate remarkable capabilities in achieving high rendering quality and fast rendering speed. Specifically, our system fully utilizes the geometric structure information provided by solid-state LiDAR to address the problem of inaccurate depth encountered when relying solely on visual solutions in unbounded, outdoor scenarios. Additionally, we utilize 3D Gaussian point clouds, with the assistance of pixel-level gradient descent, to fully exploit the color information in photos, thereby achieving realistic rendering effects. To further bolster the robustness of our system, we designed a relocalization module, which assists in returning to the correct trajectory in the event of a localization failure. Experiments conducted in multiple scenarios demonstrate the effectiveness of our method.

MM-Gaussian: 3D Gaussian-based Multi-modal Fusion for Localization and Reconstruction in Unbounded Scenes

TL;DR

MM-Gaussian, a LiDAR-camera multimodal fusion system for localization and mapping in unbounded scenes, inspired by the recently developed 3D Gaussians, fully utilizes the geometric structure information provided by solid-state LiDAR to address the problem of inaccurate depth encountered when relying solely on visual solutions in unbounded, outdoor scenarios.

Abstract

Localization and mapping are critical tasks for various applications such as autonomous vehicles and robotics. The challenges posed by outdoor environments present particular complexities due to their unbounded characteristics. In this work, we present MM-Gaussian, a LiDAR-camera multi-modal fusion system for localization and mapping in unbounded scenes. Our approach is inspired by the recently developed 3D Gaussians, which demonstrate remarkable capabilities in achieving high rendering quality and fast rendering speed. Specifically, our system fully utilizes the geometric structure information provided by solid-state LiDAR to address the problem of inaccurate depth encountered when relying solely on visual solutions in unbounded, outdoor scenarios. Additionally, we utilize 3D Gaussian point clouds, with the assistance of pixel-level gradient descent, to fully exploit the color information in photos, thereby achieving realistic rendering effects. To further bolster the robustness of our system, we designed a relocalization module, which assists in returning to the correct trajectory in the event of a localization failure. Experiments conducted in multiple scenarios demonstrate the effectiveness of our method.
Paper Structure (18 sections, 10 equations, 5 figures, 4 tables)

This paper contains 18 sections, 10 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: The library reconstructed by MM-Gaussian. We utilize LiDAR and cameras to capture scene data while estimating the sensor pose and reconstructing a 3D Gaussian map in an unbounded scene.
  • Figure 2: Overview of MM-Gaussian.
  • Figure 3: The failure in map reconstruction due to the failure in tracking. When tracking is successful, mapping proceeds smoothly, as illustrated by the map and green trajectory in the green box. However, a segment of data recorded facing the ground leads to trajectory drift indicated by the red trajectory. Subsequently, tracking stabilizes, but by this time, the map reconstructed within the red box diverged from the correct map location shown in the green box.
  • Figure 4: Tracking failed at the $t$ th frame. We use the $t-m$ th frame as a recovery point to perform a look-around operation. By solving the PnP problem, the pose of $t+i$ th frame is estimated successfully.
  • Figure 5: Rendering results of SplaTAM splatam and MM-Gaussian.