Table of Contents
Fetching ...

An Accurate and Real-time Relative Pose Estimation from Triple Point-line Images by Decoupling Rotation and Translation

Zewen Xu, Yijia He, Hao Wei, Bo Xu, BinJian Xie, Yihong Wu

TL;DR

RT$^2$PL introduces a real-time three-view pose solver that decouples rotation and translation estimation using point-line observations. Rotation is inferred via NEC for points and NBC for lines, incorporating observation uncertainty and solved with Levenberg–Marquardt and IRLS weighting, while translations are obtained through low-degree LiGT constraints for both points and lines. The approach achieves improved accuracy over trifocal-tensor methods and two-view baselines, with robust performance under degeneracies such as pure rotation and planar configurations. Extensive synthetic and real-world experiments demonstrate reliable, fast pose estimation and effective fusion of point and line features, highlighting the method's practical relevance for VO/SfM in weak-texture environments.

Abstract

Line features are valid complements for point features in man-made environments. 3D-2D constraints provided by line features have been widely used in Visual Odometry (VO) and Structure-from-Motion (SfM) systems. However, how to accurately solve three-view relative motion only with 2D observations of points and lines in real time has not been fully explored. In this paper, we propose a novel three-view pose solver based on rotation-translation decoupled estimation. First, a high-precision rotation estimation method based on normal vector coplanarity constraints that consider the uncertainty of observations is proposed, which can be solved by Levenberg-Marquardt (LM) algorithm efficiently. Second, a robust linear translation constraint that minimizes the degree of the rotation components and feature observation components in equations is elaborately designed for estimating translations accurately. Experiments on synthetic data and real-world data show that the proposed approach improves both rotation and translation accuracy compared to the classical trifocal-tensor-based method and the state-of-the-art two-view algorithm in outdoor and indoor environments.

An Accurate and Real-time Relative Pose Estimation from Triple Point-line Images by Decoupling Rotation and Translation

TL;DR

RTPL introduces a real-time three-view pose solver that decouples rotation and translation estimation using point-line observations. Rotation is inferred via NEC for points and NBC for lines, incorporating observation uncertainty and solved with Levenberg–Marquardt and IRLS weighting, while translations are obtained through low-degree LiGT constraints for both points and lines. The approach achieves improved accuracy over trifocal-tensor methods and two-view baselines, with robust performance under degeneracies such as pure rotation and planar configurations. Extensive synthetic and real-world experiments demonstrate reliable, fast pose estimation and effective fusion of point and line features, highlighting the method's practical relevance for VO/SfM in weak-texture environments.

Abstract

Line features are valid complements for point features in man-made environments. 3D-2D constraints provided by line features have been widely used in Visual Odometry (VO) and Structure-from-Motion (SfM) systems. However, how to accurately solve three-view relative motion only with 2D observations of points and lines in real time has not been fully explored. In this paper, we propose a novel three-view pose solver based on rotation-translation decoupled estimation. First, a high-precision rotation estimation method based on normal vector coplanarity constraints that consider the uncertainty of observations is proposed, which can be solved by Levenberg-Marquardt (LM) algorithm efficiently. Second, a robust linear translation constraint that minimizes the degree of the rotation components and feature observation components in equations is elaborately designed for estimating translations accurately. Experiments on synthetic data and real-world data show that the proposed approach improves both rotation and translation accuracy compared to the classical trifocal-tensor-based method and the state-of-the-art two-view algorithm in outdoor and indoor environments.
Paper Structure (23 sections, 46 equations, 12 figures, 5 tables)

This paper contains 23 sections, 46 equations, 12 figures, 5 tables.

Figures (12)

  • Figure 1: Results of the proposed method test on some difficult scenes. The first three columns of pictures represent the triple point-line images, where green points denote matched 2D point features and cyan lines denote matched 2D line features. The pose estimation results are shown on the right. Ground truth poses are in solid black, estimated poses of the proposed method (RT$^2$PL) are in orange, estimated poses of PNEC muhle2022probabilistic are in green, and estimated poses of five point method stewenius2006recent are in blue. All of estimated translation are recovered with true scale for comparisons. The cases shown above are selected from CID-SIMS datasets zhang2023cid, where the first row and third row are sampled form Seq. floor3_1, the second row and the forth row are sampled from Seq. 14-13-12. The first two rows show the performance of pose estimation methods in weak-texture cases. The last two rows show the performance of pose estimation methods in planar degeneracy cases.
  • Figure 2: Geometry of the constraints about rotations. (a) NEC: For clarity, we only show a constraint provided by three point correspondences ($\alpha$, $\beta$, and $\gamma$) in two frames. In the left picture, the projections of the 3D point $\alpha$ (yellow point) on the two images are represented as ${}^{0}\boldsymbol{f}_\alpha$ and ${}^{1}\boldsymbol{f}_\alpha$, respectively. It is obvious that the plane span by point $\alpha$, camera optical center $\boldsymbol{C}_0$ and camera optical center $\boldsymbol{C}_1$ is passing through the translation vector. This plane, namely the epipolar plane, can be represented in the frame 0 as ${}^{0}\boldsymbol{\pi}_\alpha$ , shown as the yellow plane in the right picture, whose normal vector ${}^{0}\boldsymbol{n}_\alpha$ can be obtained by Eq. \ref{['eq:noraml_point']}. The plane ${}^{0}\boldsymbol{\pi}_\beta$ and the plane ${}^{0}\boldsymbol{\pi}_\gamma$ also pass through the translation vector, which means their normal vectors are coplanar. According to Eq. \ref{['eq:noraml_point']}, Eq. \ref{['eq:NEC_M']} and Eq. \ref{['eq:point_eigenvalue']}, this constraint is only about the relative rotation $\boldsymbol{R}_{01}$. (b) NBC: For clarity, we only show a constraint provided by a line across three frames. The back-projected plane is defined by the 2D line observation and the camera optical center like ${}^0\pi_0$, ${}^1\pi_1$ and ${}^2\pi_2$ in the left picture. These back-projected planes of the 2D line correspondences are intersecting with the related 3D line. Therefore, the normal vectors of these back-projected planes are coplanar. We represent these three normal vectors in the Frame 1 as ${}^{1}\boldsymbol{n}_0$, ${}^{1}\boldsymbol{n}_1$ and ${}^{1}\boldsymbol{n}_2$ as shown in the right picture, which can be obtained by Eq. \ref{['eq:back-projected_normal']}. This constraint is only about the relative rotations $\boldsymbol{R}_{10}$ and $\boldsymbol{R}_{12}$. Therefore, whatever for points and lines, the rotations can be estimated decoupled with translations through NEC and NBC.
  • Figure 3: Ablation experiments. The confighuration is set same as Tab \ref{['tab:major_test']}. Each value is averaged over 1000 random experiments.
  • Figure 4: Convergence and resilience to outliers. For this experiment, 100 random points and 100 random lines are generated. All algorithms are embedded into a RANSAC scheme with the same outlier threshold and inlier criteria. Following the approach outlined in kneip2013direct, five features are used as the sample set for 5pt-nist and 5pt-stew. Ten features are used as the sample sets for all non-minimal solvers.
  • Figure 5: Convergence analysis for two NBC forms. (a) Results of RANSAC with random initialization. The configuration is the same as that for Fig. \ref{['fig:outlier']}. (b) Initial value resilience test. We set the initial values with deviations ranging from $0^\circ$ to $10^\circ$ near the true values to test the algorithm convergence without outliers. The noise is fixed to 0.5 pixels.
  • ...and 7 more figures