COMO: Compact Mapping and Odometry
Eric Dexheimer, Andrew J. Davison
TL;DR
COMO introduces a real-time monocular SLAM system that encodes dense geometry with a compact set of 3D anchor points and decodes dense depth maps through per-keyframe depth covariance functions, enabling joint optimization of camera poses and dense geometry with intrinsic 3D consistency. The approach combines a GP-based depth covariance function with a compact backend and a visual frontend that leverages covariance for visibility and active anchor-point initialization, achieving real-time performance. Empirical results on Replica, TUM, and ScanNet show COMO delivering superior pose accuracy and depth consistency, outperforming both sparse and prior dense-prior baselines, and demonstrating practical efficacy for robust monocular SLAM. The work highlights the feasibility of combining compact 3D priors with learned covariances to render dense, consistent geometry tractable in real-time monocular settings, with potential extensions to full map-centric SLAM and learned correspondences.
Abstract
We present COMO, a real-time monocular mapping and odometry system that encodes dense geometry via a compact set of 3D anchor points. Decoding anchor point projections into dense geometry via per-keyframe depth covariance functions guarantees that depth maps are joined together at visible anchor points. The representation enables joint optimization of camera poses and dense geometry, intrinsic 3D consistency, and efficient second-order inference. To maintain a compact yet expressive map, we introduce a frontend that leverages the covariance function for tracking and initializing potentially visually indistinct 3D points across frames. Altogether, we introduce a real-time system capable of estimating accurate poses and consistent geometry.
