Table of Contents
Fetching ...

CodedVO: Coded Visual Odometry

Sachin Shah, Naitri Rajyaguru, Chahat Deep Singh, Christopher Metzler, Yiannis Aloimonos

TL;DR

This work tackles scale ambiguity in monocular visual odometry by encoding metric depth into imagery using a phase-mask coded aperture. A two-stage pipeline predicts metric depth from coded RGB frames with a depth-weighted loss and then fuses depth with RGB imagery to perform VO using existing RGB-D SLAM frameworks. Key contributions include the CodedVO framework, a depth-weighted loss (L_dw), a coded optics simulation dataset, and demonstrated state-of-the-art monocular VO with known scale, achieving an average ATE of 0.08 m on ICL-NUIM. This approach enables accurate, scale-aware VO on indoor scenes with a compact optical setup, promising applicability to small, resource-constrained robots.

Abstract

Autonomous robots often rely on monocular cameras for odometry estimation and navigation. However, the scale ambiguity problem presents a critical barrier to effective monocular visual odometry. In this paper, we present CodedVO, a novel monocular visual odometry method that overcomes the scale ambiguity problem by employing custom optics to physically encode metric depth information into imagery. By incorporating this information into our odometry pipeline, we achieve state-of-the-art performance in monocular visual odometry with a known scale. We evaluate our method in diverse indoor environments and demonstrate its robustness and adaptability. We achieve a 0.08m average trajectory error in odometry evaluation on the ICL-NUIM indoor odometry dataset.

CodedVO: Coded Visual Odometry

TL;DR

This work tackles scale ambiguity in monocular visual odometry by encoding metric depth into imagery using a phase-mask coded aperture. A two-stage pipeline predicts metric depth from coded RGB frames with a depth-weighted loss and then fuses depth with RGB imagery to perform VO using existing RGB-D SLAM frameworks. Key contributions include the CodedVO framework, a depth-weighted loss (L_dw), a coded optics simulation dataset, and demonstrated state-of-the-art monocular VO with known scale, achieving an average ATE of 0.08 m on ICL-NUIM. This approach enables accurate, scale-aware VO on indoor scenes with a compact optical setup, promising applicability to small, resource-constrained robots.

Abstract

Autonomous robots often rely on monocular cameras for odometry estimation and navigation. However, the scale ambiguity problem presents a critical barrier to effective monocular visual odometry. In this paper, we present CodedVO, a novel monocular visual odometry method that overcomes the scale ambiguity problem by employing custom optics to physically encode metric depth information into imagery. By incorporating this information into our odometry pipeline, we achieve state-of-the-art performance in monocular visual odometry with a known scale. We evaluate our method in diverse indoor environments and demonstrate its robustness and adaptability. We achieve a 0.08m average trajectory error in odometry evaluation on the ICL-NUIM indoor odometry dataset.
Paper Structure (16 sections, 4 equations, 4 figures, 3 tables)

This paper contains 16 sections, 4 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: System Overview. Our proposed approach leverages coded apertures to predict metric dense depth maps using only an RGB sensor tailored for monocular odometry estimation.
  • Figure 2: Metric Depth Prediction Framework: Our proposed method consists of two parts: (a) a Coded Simulator and (b) a Depth Estimation Network. The coded simulator utilizes RGB-D ground truth input and PSFs to simulate coded blurred RGB images using equation \ref{['eq:coded']}. The depth network learns to predict metric depth from image $\mathcal{I}_c$ captured by a calibrated coded camera.
  • Figure 3: Qualitative Evaluation: Depth Prediction. Depth comparison for different datasets (top to bottom): UMD-CodedVO (DiningRoom and Corridor), iBIM-110.1007/978-3-030-11015-4_25, ICL-NUIM (lr)handa:etal:ICRA2014 and ICL-NUIM (of) handa:etal:ICRA2014 with existing metric depth estimation methods.
  • Figure 4: Trajectory comparison on ICL-NUIMhanda:etal:ICRA2014 (of-krt2 and lr-krt2) and UMD-CodedVO (Dining and Corridor). We compare the performance of ORBSLAM2 that utilizes ZoeDepth and our Coded Depth ($\mathcal{L}_{dw}$) as the depth input for ORB-SLAM2.