Table of Contents
Fetching ...

COMO: Compact Mapping and Odometry

Eric Dexheimer, Andrew J. Davison

TL;DR

COMO introduces a real-time monocular SLAM system that encodes dense geometry with a compact set of 3D anchor points and decodes dense depth maps through per-keyframe depth covariance functions, enabling joint optimization of camera poses and dense geometry with intrinsic 3D consistency. The approach combines a GP-based depth covariance function with a compact backend and a visual frontend that leverages covariance for visibility and active anchor-point initialization, achieving real-time performance. Empirical results on Replica, TUM, and ScanNet show COMO delivering superior pose accuracy and depth consistency, outperforming both sparse and prior dense-prior baselines, and demonstrating practical efficacy for robust monocular SLAM. The work highlights the feasibility of combining compact 3D priors with learned covariances to render dense, consistent geometry tractable in real-time monocular settings, with potential extensions to full map-centric SLAM and learned correspondences.

Abstract

We present COMO, a real-time monocular mapping and odometry system that encodes dense geometry via a compact set of 3D anchor points. Decoding anchor point projections into dense geometry via per-keyframe depth covariance functions guarantees that depth maps are joined together at visible anchor points. The representation enables joint optimization of camera poses and dense geometry, intrinsic 3D consistency, and efficient second-order inference. To maintain a compact yet expressive map, we introduce a frontend that leverages the covariance function for tracking and initializing potentially visually indistinct 3D points across frames. Altogether, we introduce a real-time system capable of estimating accurate poses and consistent geometry.

COMO: Compact Mapping and Odometry

TL;DR

COMO introduces a real-time monocular SLAM system that encodes dense geometry with a compact set of 3D anchor points and decodes dense depth maps through per-keyframe depth covariance functions, enabling joint optimization of camera poses and dense geometry with intrinsic 3D consistency. The approach combines a GP-based depth covariance function with a compact backend and a visual frontend that leverages covariance for visibility and active anchor-point initialization, achieving real-time performance. Empirical results on Replica, TUM, and ScanNet show COMO delivering superior pose accuracy and depth consistency, outperforming both sparse and prior dense-prior baselines, and demonstrating practical efficacy for robust monocular SLAM. The work highlights the feasibility of combining compact 3D priors with learned covariances to render dense, consistent geometry tractable in real-time monocular settings, with potential extensions to full map-centric SLAM and learned correspondences.

Abstract

We present COMO, a real-time monocular mapping and odometry system that encodes dense geometry via a compact set of 3D anchor points. Decoding anchor point projections into dense geometry via per-keyframe depth covariance functions guarantees that depth maps are joined together at visible anchor points. The representation enables joint optimization of camera poses and dense geometry, intrinsic 3D consistency, and efficient second-order inference. To maintain a compact yet expressive map, we introduce a frontend that leverages the covariance function for tracking and initializing potentially visually indistinct 3D points across frames. Altogether, we introduce a real-time system capable of estimating accurate poses and consistent geometry.
Paper Structure (26 sections, 17 equations, 9 figures, 10 tables)

This paper contains 26 sections, 17 equations, 9 figures, 10 tables.

Figures (9)

  • Figure 1: COMO encodes scene geometry via a compact set of 3D anchor points and decodes dense geometry via per-keyframe depth covariance functions. The 3D points visualized in red anchor depth maps together from multiple views while the covariance function generates dense geometry by conditioning on sparse point projections.
  • Figure 2: Reconstructions on Replica and geometry properties of different dense map representations. The dotted line represents the true surface. (a) Densely reconstructing a large number of conditionally independent 3D points given poses enables accurate pose estimation and many accurate points, but there is no guarantee of coherent geometry. (b) Depth priors produce smooth depth maps, but even with inter-frame consistency losses, can produce inconsistent geometry and bias preventing global consistency. (c) A compact set of 3D points and depth covariance functions anchor depth maps together, leading to consistent pose estimation and dense geometry.
  • Figure 3: Overview of compact mapping framework. (a) Anchor points are projected to visible keyframes. (b) Keyframes decode into dense depth and backproject geometry to 3D. (c) Target frames enforce dense photo-consistency. (d) Optimizing dense alignment error leads to updated poses and geometry with greater 3D consistency.
  • Figure 4: Overview of visibility checks between two keyframes (KF). Matches are shown in blue, rejected matches in red, and newly initialized points in green. Note that occluded edges are rejected, while non-visually distinct points are often matched. New points are allocated to geometrically complex regions while the table is already well-represented.
  • Figure 4: Depth absolute relative error.
  • ...and 4 more figures