Table of Contents
Fetching ...

RePoseD: Efficient Relative Pose Estimation With Known Depth Information

Yaqing Ding, Viktor Kocur, Václav Vávra, Zuzana Berger Haladová, Jian Yang, Torsten Sattler, Zuzana Kukelova

TL;DR

RePoseD addresses relative pose estimation by leveraging monocular depth maps with unknown scale and shift. It introduces minimal solvers that jointly estimate depth-scale/shift parameters and camera pose across calibrated, shared-focal, and different-focal setups, achieving superior speed and accuracy in many scenarios. The approach consistently outperforms depth-agnostic and some depth-aware baselines when high-quality depth estimates are available, while preserving robustness to depth noise. Practically, RePoseD offers guidance on solver choice depending on depth quality and camera configuration, with broad applicability to SfM, localization, and robotics tasks.

Abstract

Recent advances in monocular depth estimation methods (MDE) and their improved accuracy open new possibilities for their applications. In this paper, we investigate how monocular depth estimates can be used for relative pose estimation. In particular, we are interested in answering the question whether using MDEs improves results over traditional point-based methods. We propose a novel framework for estimating the relative pose of two cameras from point correspondences with associated monocular depths. Since depth predictions are typically defined up to an unknown scale or even both unknown scale and shift parameters, our solvers jointly estimate the scale or both the scale and shift parameters along with the relative pose. We derive efficient solvers considering different types of depths for three camera configurations: (1) two calibrated cameras, (2) two cameras with an unknown shared focal length, and (3) two cameras with unknown different focal lengths. Our new solvers outperform state-of-the-art depth-aware solvers in terms of speed and accuracy. In extensive real experiments on multiple datasets and with various MDEs, we discuss which depth-aware solvers are preferable in which situation. The code will be made publicly available.

RePoseD: Efficient Relative Pose Estimation With Known Depth Information

TL;DR

RePoseD addresses relative pose estimation by leveraging monocular depth maps with unknown scale and shift. It introduces minimal solvers that jointly estimate depth-scale/shift parameters and camera pose across calibrated, shared-focal, and different-focal setups, achieving superior speed and accuracy in many scenarios. The approach consistently outperforms depth-agnostic and some depth-aware baselines when high-quality depth estimates are available, while preserving robustness to depth noise. Practically, RePoseD offers guidance on solver choice depending on depth quality and camera configuration, with broad applicability to SfM, localization, and robotics tasks.

Abstract

Recent advances in monocular depth estimation methods (MDE) and their improved accuracy open new possibilities for their applications. In this paper, we investigate how monocular depth estimates can be used for relative pose estimation. In particular, we are interested in answering the question whether using MDEs improves results over traditional point-based methods. We propose a novel framework for estimating the relative pose of two cameras from point correspondences with associated monocular depths. Since depth predictions are typically defined up to an unknown scale or even both unknown scale and shift parameters, our solvers jointly estimate the scale or both the scale and shift parameters along with the relative pose. We derive efficient solvers considering different types of depths for three camera configurations: (1) two calibrated cameras, (2) two cameras with an unknown shared focal length, and (3) two cameras with unknown different focal lengths. Our new solvers outperform state-of-the-art depth-aware solvers in terms of speed and accuracy. In extensive real experiments on multiple datasets and with various MDEs, we discuss which depth-aware solvers are preferable in which situation. The code will be made publicly available.
Paper Structure (19 sections, 23 equations, 3 figures, 15 tables)

This paper contains 19 sections, 23 equations, 3 figures, 15 tables.

Figures (3)

  • Figure 1: Given point matches and their corresponding depths from a pair of images, we propose the use of different minimal solvers for relative pose estimation, according to the properties of the depth data. For local optimization, the Sampson error is generally more robust across varying depth conditions, while the reprojection error may yield better results when the depth measurements are highly reliable.
  • Figure 2: Speed accuracy evaluation for the case of two cameras with shared unknown focal lengths using the RoMA matches edstedt2024roma and UniDepth piccinelli2024unidepth depths on the ETH3D dataset schops2017multi. We evaluated mAA and runtimes ($\tau$) for each method running for 50, 100, 200, 500 and 1000 iterations with exception for Mast3r leroy2024grounding which ran with maximum of 500 iterations.
  • Figure 3: Speed accuracy evaluation for different unknown focal length case using the SP+LG matches detone2018superpointlindenberger2023lightglue and MoGe wang2024moge depths on the Phototourism dataset Jin2020. We evaluated mAA and runtimes ($\tau$) for each method running for 50, 100, 200, 500 and 1000 iterations within PoseLib poselib.