Table of Contents
Fetching ...

Direct Sparse Odometry with Continuous 3D Gaussian Maps for Indoor Environments

Jie Deng, Fengtian Lang, Zikang Yuan, Xin Yang

TL;DR

The paper tackles indoor monocular VO by addressing depth-approximation errors from discrete prior maps. It introduces a continuous 3D Gaussian map rendered via differentiable Gaussian splatting to provide depth $d_p$ for every pixel, enabling direct photometric optimization without interpolation. The approach comprises a global map module that renders depth maps from a Gaussian map and a local odometry that uses pixel-depth pairs for pose estimation, with a sliding-window BA to refine keyframe trajectories. Experimental results on two public datasets show improved tracking accuracy and robustness over baselines, and the authors release their code to foster community development.

Abstract

Accurate localization is essential for robotics and augmented reality applications such as autonomous navigation. Vision-based methods combining prior maps aim to integrate LiDAR-level accuracy with camera cost efficiency for robust pose estimation. Existing approaches, however, often depend on unreliable interpolation procedures when associating discrete point cloud maps with dense image pixels, which inevitably introduces depth errors and degrades pose estimation accuracy. We propose a monocular visual odometry framework utilizing a continuous 3D Gaussian map, which directly assigns geometrically consistent depth values to all extracted high-gradient points without interpolation. Evaluations on two public datasets demonstrate superior tracking accuracy compared to existing methods. We have released the source code of this work for the development of the community.

Direct Sparse Odometry with Continuous 3D Gaussian Maps for Indoor Environments

TL;DR

The paper tackles indoor monocular VO by addressing depth-approximation errors from discrete prior maps. It introduces a continuous 3D Gaussian map rendered via differentiable Gaussian splatting to provide depth for every pixel, enabling direct photometric optimization without interpolation. The approach comprises a global map module that renders depth maps from a Gaussian map and a local odometry that uses pixel-depth pairs for pose estimation, with a sliding-window BA to refine keyframe trajectories. Experimental results on two public datasets show improved tracking accuracy and robustness over baselines, and the authors release their code to foster community development.

Abstract

Accurate localization is essential for robotics and augmented reality applications such as autonomous navigation. Vision-based methods combining prior maps aim to integrate LiDAR-level accuracy with camera cost efficiency for robust pose estimation. Existing approaches, however, often depend on unreliable interpolation procedures when associating discrete point cloud maps with dense image pixels, which inevitably introduces depth errors and degrades pose estimation accuracy. We propose a monocular visual odometry framework utilizing a continuous 3D Gaussian map, which directly assigns geometrically consistent depth values to all extracted high-gradient points without interpolation. Evaluations on two public datasets demonstrate superior tracking accuracy compared to existing methods. We have released the source code of this work for the development of the community.

Paper Structure

This paper contains 14 sections, 7 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: The top left picture and top right picture are depth maps from interpolation and gaussian splatting respectively. The bottom one is discrete point cloud map.While interpolation produces unreliable depth estimates in point cloud gaps, Gaussian splatting maintains consistent accuracy in these regions.
  • Figure 2: Overview of our system. The steps represented by dotted lines just run once: Initialization online and global map processing offline while the steps represented by solid lines execute as the system runs.
  • Figure 3: Illustration of depth association and projection relationship of our system. A tracking point is reconstrcuted from frame $k$ with the pixel-depth pair and then projected to frame $c$. The blue dotted line indicates the direct mapping relationship between depth maps and input images.
  • Figure 4: (a)-(d),(e)-(h) examplar frames in seqence $icl\_1$ and $augm\_4$, respectively.(a)(e) are input images, (b)(f) are depth maps obtained by interpolation, (c)(g) are depth maps generated by our system and (d)(h) are ground-truth depth maps. They show the difference between the depth maps obtained using the interpolation method and the depth maps rendered from the Gaussian map. The dark areas in the picture set (b) represents depth values that are not valid due to interpolation mistakes.
  • Figure 5: (a) is the estimated trajectory of $icl\_2$, (b) is the estimated trajectory of $icl\_3$, (c) is the estimated trajectory of $augm\_2$ and (d) is the estimated trajectory of $augm\_4$.