Table of Contents
Fetching ...

I$^2$-SLAM: Inverting Imaging Process for Robust Photorealistic Dense SLAM

Gwangtak Bae, Changwoon Choi, Hyeongjun Heo, Sang Min Kim, Young Min Kim

TL;DR

I^2-SLAM addresses the fragility of dense visual SLAM in casually captured videos by inverting the image-formation process with HDR radiance maps, per-frame white balance, exposure, and camera response function, and by simulating motion blur across exposure using multiple virtual poses. The method jointly optimizes the HDR map, camera trajectory, and image-formation parameters within a differentiable framework that can augment NeRF-SLAM and 3D Gaussian Splatting pipelines. Empirical results on RGB and RGBD datasets demonstrate improved rendering quality and tracking accuracy, along with thorough ablations and runtime analyses. This work enables robust, photorealistic dense SLAM in real-world capture conditions, benefiting AR/VR, robotics, and scene understanding tasks.

Abstract

We present an inverse image-formation module that can enhance the robustness of existing visual SLAM pipelines for casually captured scenarios. Casual video captures often suffer from motion blur and varying appearances, which degrade the final quality of coherent 3D visual representation. We propose integrating the physical imaging into the SLAM system, which employs linear HDR radiance maps to collect measurements. Specifically, individual frames aggregate images of multiple poses along the camera trajectory to explain prevalent motion blur in hand-held videos. Additionally, we accommodate per-frame appearance variation by dedicating explicit variables for image formation steps, namely white balance, exposure time, and camera response function. Through joint optimization of additional variables, the SLAM pipeline produces high-quality images with more accurate trajectories. Extensive experiments demonstrate that our approach can be incorporated into recent visual SLAM pipelines using various scene representations, such as neural radiance fields or Gaussian splatting.

I$^2$-SLAM: Inverting Imaging Process for Robust Photorealistic Dense SLAM

TL;DR

I^2-SLAM addresses the fragility of dense visual SLAM in casually captured videos by inverting the image-formation process with HDR radiance maps, per-frame white balance, exposure, and camera response function, and by simulating motion blur across exposure using multiple virtual poses. The method jointly optimizes the HDR map, camera trajectory, and image-formation parameters within a differentiable framework that can augment NeRF-SLAM and 3D Gaussian Splatting pipelines. Empirical results on RGB and RGBD datasets demonstrate improved rendering quality and tracking accuracy, along with thorough ablations and runtime analyses. This work enables robust, photorealistic dense SLAM in real-world capture conditions, benefiting AR/VR, robotics, and scene understanding tasks.

Abstract

We present an inverse image-formation module that can enhance the robustness of existing visual SLAM pipelines for casually captured scenarios. Casual video captures often suffer from motion blur and varying appearances, which degrade the final quality of coherent 3D visual representation. We propose integrating the physical imaging into the SLAM system, which employs linear HDR radiance maps to collect measurements. Specifically, individual frames aggregate images of multiple poses along the camera trajectory to explain prevalent motion blur in hand-held videos. Additionally, we accommodate per-frame appearance variation by dedicating explicit variables for image formation steps, namely white balance, exposure time, and camera response function. Through joint optimization of additional variables, the SLAM pipeline produces high-quality images with more accurate trajectories. Extensive experiments demonstrate that our approach can be incorporated into recent visual SLAM pipelines using various scene representations, such as neural radiance fields or Gaussian splatting.
Paper Structure (36 sections, 19 equations, 12 figures, 8 tables)

This paper contains 36 sections, 19 equations, 12 figures, 8 tables.

Figures (12)

  • Figure 1: We propose $I^2$-SLAM, a SLAM pipeline with a physical image formation process. We can reconstruct photorealistic and sharp HDR maps from casually captured videos which contain severe motion blur and varying appearances.
  • Figure 2: Method overview. (a) We reconstruct a sharp HDR radiance field map. (b) Motion blur is simulated by integration of sharp images, which are obtained from virtual camera poses during the exposure time. Then we obtain the blurry LDR image by applying differentiable tone mapping module. (c) SLAM methods simultaneously perform tracking and mapping from degraded images to reconstruct a sharp HDR map.
  • Figure 3: Qualitative results on applying $I^2$-SLAM to the RGB-SLAM method.
  • Figure 4: Qualitative results on applying $I^2$-SLAM to the RGBD-SLAM method.
  • Figure 5: Performance variations over iteration time of $I^2$-SLAM and SplaTAM keetha2023splatam.
  • ...and 7 more figures