CodedVO: Coded Visual Odometry
Sachin Shah, Naitri Rajyaguru, Chahat Deep Singh, Christopher Metzler, Yiannis Aloimonos
TL;DR
This work tackles scale ambiguity in monocular visual odometry by encoding metric depth into imagery using a phase-mask coded aperture. A two-stage pipeline predicts metric depth from coded RGB frames with a depth-weighted loss and then fuses depth with RGB imagery to perform VO using existing RGB-D SLAM frameworks. Key contributions include the CodedVO framework, a depth-weighted loss (L_dw), a coded optics simulation dataset, and demonstrated state-of-the-art monocular VO with known scale, achieving an average ATE of 0.08 m on ICL-NUIM. This approach enables accurate, scale-aware VO on indoor scenes with a compact optical setup, promising applicability to small, resource-constrained robots.
Abstract
Autonomous robots often rely on monocular cameras for odometry estimation and navigation. However, the scale ambiguity problem presents a critical barrier to effective monocular visual odometry. In this paper, we present CodedVO, a novel monocular visual odometry method that overcomes the scale ambiguity problem by employing custom optics to physically encode metric depth information into imagery. By incorporating this information into our odometry pipeline, we achieve state-of-the-art performance in monocular visual odometry with a known scale. We evaluate our method in diverse indoor environments and demonstrate its robustness and adaptability. We achieve a 0.08m average trajectory error in odometry evaluation on the ICL-NUIM indoor odometry dataset.
