From Correspondences to Pose: Non-minimal Certifiably Optimal Relative Pose without Disambiguation

Javier Tirado-Garín; Javier Civera

From Correspondences to Pose: Non-minimal Certifiably Optimal Relative Pose without Disambiguation

Javier Tirado-Garín, Javier Civera

TL;DR

This paper shows that it is possible to directly estimate the correct relative camera pose from correspondences without needing a post-processing step to enforce the cheirality constraint on the correspondences.

Abstract

Estimating the relative camera pose from $n \geq 5$ correspondences between two calibrated views is a fundamental task in computer vision. This process typically involves two stages: 1) estimating the essential matrix between the views, and 2) disambiguating among the four candidate relative poses that satisfy the epipolar geometry. In this paper, we demonstrate a novel approach that, for the first time, bypasses the second stage. Specifically, we show that it is possible to directly estimate the correct relative camera pose from correspondences without needing a post-processing step to enforce the cheirality constraint on the correspondences. Building on recent advances in certifiable non-minimal optimization, we frame the relative pose estimation as a Quadratically Constrained Quadratic Program (QCQP). By applying the appropriate constraints, we ensure the estimation of a camera pose that corresponds to a valid 3D geometry and that is globally optimal when certified. We validate our method through exhaustive synthetic and real-world experiments, confirming the efficacy, efficiency and accuracy of the proposed approach. Code is available at https://github.com/javrtg/C2P.

From Correspondences to Pose: Non-minimal Certifiably Optimal Relative Pose without Disambiguation

TL;DR

Abstract

Estimating the relative camera pose from

correspondences between two calibrated views is a fundamental task in computer vision. This process typically involves two stages: 1) estimating the essential matrix between the views, and 2) disambiguating among the four candidate relative poses that satisfy the epipolar geometry. In this paper, we demonstrate a novel approach that, for the first time, bypasses the second stage. Specifically, we show that it is possible to directly estimate the correct relative camera pose from correspondences without needing a post-processing step to enforce the cheirality constraint on the correspondences. Building on recent advances in certifiable non-minimal optimization, we frame the relative pose estimation as a Quadratically Constrained Quadratic Program (QCQP). By applying the appropriate constraints, we ensure the estimation of a camera pose that corresponds to a valid 3D geometry and that is globally optimal when certified. We validate our method through exhaustive synthetic and real-world experiments, confirming the efficacy, efficiency and accuracy of the proposed approach. Code is available at https://github.com/javrtg/C2P.

Paper Structure (18 sections, 3 theorems, 37 equations, 10 figures, 1 algorithm)

This paper contains 18 sections, 3 theorems, 37 equations, 10 figures, 1 algorithm.

Introduction
Related Work
Non-minimal solver for the relative pose
Necessary and sufficient constraints
QCQP
SDP relaxation and optimization
Relative pose recovery
Experiments
Conclusion and Limitations
Additional details
Averaging of data-dependent constraints
Appropriate scaling of the solution estimates
Pure rotations and numerical accuracy
Tightness of zhao2022nonmin when the SDP solution is rank-2
Algebraic derivation of \ref{['eq:midpoints']}
...and 3 more sections

Key Result

Theorem 3.1

A real $3\times3$ matrix, $\mathbf{E}$, is an element of $\mathcal{M}_{\mathbf{E}}$ if and only if it satisfies: for two vectors $\mathbf{t},\mathbf{q}\in\mathcal{S}^2$ and where $\mathop{\mathrm{Adj}}\nolimits(\mathbf{E})$ represents the https://en.wikipedia.org/wiki/Adjugate_matrixhorn2012matrix of $\mathbf{E}$.

Figures (10)

Figure 1: Relative pose directly from matches, without posterior disambiguation and pure rotation checks. Traditionally, estimating the relative pose involves two steps: 1) Estimating the essential matrix $\mathbf{E}$ using an approximate or globally-optimal solver, and 2) disambiguating the unique geometrically valid pose among four candidate relative poses, with an additional step to determine if the motion is purely rotational. In this paper, we introduce C2P, a globally-optimal and certifiable approach that, for the first time, solves the relative pose problem in a single step.
Figure 2: Necessary geometric conditions. When removing the rotational flow between the bearing vectors kneip2012finding, i.e. considering $\mathbf{f}_0$ and $\mathbf{R}\mathbf{f}_1$, they must exhibit the same (counter-)clockwise turn w.r.t. the translation (\ref{['eq:rot_dis2']}). Otherwise, the rotation must be a reflected version, $\mathbf{R}_{\pi}$, of the true rotation, $\mathbf{R}$. Considering the correct rotation, $\mathbf{f}_0$ must have greater projection onto the translation than $\mathbf{R}\mathbf{f}_1$ (\ref{['eq:dis_t2']}). Otherwise, the bearings would not meet along the direction of their beams ( , ), implying that the translation is flipped ($-\mathbf{t}$) w.r.t. the correct one, $\mathbf{t}$. Therefore, besides avoiding triangulation, these constraints completely disambiguate the relative pose and are generally applicable to central camera models since they do not rely on traditional positive-depth constraints.
Figure 3: Automatic disambiguation of the relative pose. Our method restricts the set of possible rotations and translations (unit vectors, due to scale ambiguity) for solving the relative pose, by incorporating cheirality constraints in the optimization. We visualize this for the translation with cost maps of squared epipolar errors in the tangent space at the ground-truth translation, $\mathcal{T}_{\mathbf{t}}\mathcal{S}^2$, and for different levels of noise. Elements in $\mathcal{T}_{\mathbf{t}}\mathcal{S}^2$ are mapped to the sphere along geodesics using the exponential map, which is a bijective mapping for $\lVert\mathbf{v}\rVert\leq\pi$ with $\mathbf{v}\in\mathcal{T}_{\mathbf{t}}\mathcal{S}^2$boumal2023intromanifolds. This enables us to show, on the right, the space not satisfying the constraint of \ref{['eq:dis_t2']} with lower opacity, named $\mathcal{R}$. As can be seen, the global minimum corresponding to $\mathbf{t}$, always lies within the unrestricted space, named $\mathcal{U}$, while it excludes the global minima corresponding to $-\mathbf{t}$. Therefore the solver is able to automatically select the translation with the correct sign as the solution.
Figure 4: The solution is found in the dominant singular vector. We show, across different levels of noise and number of correspondences (we repeat each experiment 100 times), boxplots for the ratios $\sigma_{\text{sol}}/\sigma_0$ (left box), $\sigma_{\text{sol}}/\sigma_1$ (middle box), $\sigma_{\text{sol}}/\sigma_2$ (right box). $\sigma_{\text{sol}}$ corresponds to the singular value whose vector contains the estimate closest to the ground-truth. $\sigma_0, \sigma_1, \sigma_2$ represent, in order ($\sigma_0>\sigma_1>\sigma_2$) the top-three singular values (the rest are close to zero). As can be seen, the solution vector consistently corresponds to the dominant singular vector i.e.$\sigma_{\text{sol}}/\sigma_0=1$. Therefore, we can directly select the dominant vector to recover the solution.
Figure 5: Run time vs number of correspondences. We compare the execution time (in sec.) of C2P against zhao2022nonmin and garciasalguero2022tighter. Unlike C2P, zhao2022nonmingarciasalguero2022tighter need a post-processing step to disambiguate the four valid candidate poses. For this, we use two methods: (T) the classic cheirality check Hartley2004mvg, that triangulates the points and checks for positive-depths, and (M) A faster alternative, that avoids triangulation and instead checks \ref{['eq:midpoints']}. C2P-fast and zhao2022nonmin + M, are the fastest when the number of correspondences is low, and there exist small difference ($<2$ ms) if we use redundant constraints (C2P). However, for $n>10^{3}$ (common in dense matchers truong2023pdcnetpamiEdstedt2023dkmedstedt2023roma) the disambiguation step starts dominating the runtime of zhao2022nonmin + M, while both versions of our method (C2P and C2P-fast) present up to 4x and 35x times better runtimes w.r.t. the fastest, and slowest alternative, respectively.
...and 5 more figures

Theorems & Definitions (6)

Theorem 3.1
proof
Theorem 4.1
proof
Theorem B.1
proof

From Correspondences to Pose: Non-minimal Certifiably Optimal Relative Pose without Disambiguation

TL;DR

Abstract

From Correspondences to Pose: Non-minimal Certifiably Optimal Relative Pose without Disambiguation

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (10)

Theorems & Definitions (6)