Table of Contents
Fetching ...

Towards Rotation-only Imaging Geometry: Rotation Estimation

Xinrui Li, Qi Cai, Yuanxin Wu

TL;DR

This work proposes a rotation-only imaging geometry for SfM by representing translation on the rotation manifold and optimizing reprojection error solely over camera rotations. It introduces PPO constraints, analyzes translation solution spaces under different scene structures, and derives TRRM to perform two-view rotation optimization, plus GRRM for global multi-view rotation estimation. A scene-structure detector identifies RotationSingular configurations to avoid translation degeneration (PR/B/I, Holoplane, RankRegular-line), improving robustness. The approach yields substantial accuracy gains over state-of-the-art rotation estimation methods, with performance approaching four rounds of BA on OpenMVG pipelines, and demonstrates strong two-view and multi-view results on simulations and the Strecha dataset. Overall, the rotation-only framework offers improved efficiency, robustness to noise, and competitive accuracy for 3D visual computing pipelines.

Abstract

Structure from Motion (SfM) is a critical task in computer vision, aiming to recover the 3D scene structure and camera motion from a sequence of 2D images. The recent pose-only imaging geometry decouples 3D coordinates from camera poses and demonstrates significantly better SfM performance through pose adjustment. Continuing the pose-only perspective, this paper explores the critical relationship between the scene structures, rotation and translation. Notably, the translation can be expressed in terms of rotation, allowing us to condense the imaging geometry representation onto the rotation manifold. A rotation-only optimization framework based on reprojection error is proposed for both two-view and multi-view scenarios. The experiment results demonstrate superior accuracy and robustness performance over the current state-of-the-art rotation estimation methods, even comparable to multiple bundle adjustment iteration results. Hopefully, this work contributes to even more accurate, efficient and reliable 3D visual computing.

Towards Rotation-only Imaging Geometry: Rotation Estimation

TL;DR

This work proposes a rotation-only imaging geometry for SfM by representing translation on the rotation manifold and optimizing reprojection error solely over camera rotations. It introduces PPO constraints, analyzes translation solution spaces under different scene structures, and derives TRRM to perform two-view rotation optimization, plus GRRM for global multi-view rotation estimation. A scene-structure detector identifies RotationSingular configurations to avoid translation degeneration (PR/B/I, Holoplane, RankRegular-line), improving robustness. The approach yields substantial accuracy gains over state-of-the-art rotation estimation methods, with performance approaching four rounds of BA on OpenMVG pipelines, and demonstrates strong two-view and multi-view results on simulations and the Strecha dataset. Overall, the rotation-only framework offers improved efficiency, robustness to noise, and competitive accuracy for 3D visual computing pipelines.

Abstract

Structure from Motion (SfM) is a critical task in computer vision, aiming to recover the 3D scene structure and camera motion from a sequence of 2D images. The recent pose-only imaging geometry decouples 3D coordinates from camera poses and demonstrates significantly better SfM performance through pose adjustment. Continuing the pose-only perspective, this paper explores the critical relationship between the scene structures, rotation and translation. Notably, the translation can be expressed in terms of rotation, allowing us to condense the imaging geometry representation onto the rotation manifold. A rotation-only optimization framework based on reprojection error is proposed for both two-view and multi-view scenarios. The experiment results demonstrate superior accuracy and robustness performance over the current state-of-the-art rotation estimation methods, even comparable to multiple bundle adjustment iteration results. Hopefully, this work contributes to even more accurate, efficient and reliable 3D visual computing.

Paper Structure

This paper contains 25 sections, 6 theorems, 92 equations, 9 figures, 4 tables, 4 algorithms.

Key Result

Theorem 1

${{\boldsymbol{P}}_{ijk}} = {{\boldsymbol{\theta }}_{ijk}}{\boldsymbol{\theta }}_{ijk}^T$.

Figures (9)

  • Figure 1: An illustration of reconstruction using Lund dataset lunddataset. Views form a connectivity graph $\mathcal{G}$, where nodes of graph include observation and pose information of views, while edges connecting nodes indicate presence of matched observations and visual geometric constraints.
  • Figure 2: Generation of projected point $\boldsymbol{X}_{ijk}^j$, reconstructed along projection ray from $i$-th view and projected to $j$-th view. On image plane, distance between $\boldsymbol{X}_{ijk}^j$ and $\tilde{\boldsymbol{X}}_{jk}$ represents pose-only reprojection residual $\| \boldsymbol{V}_{ijk}^{PA,j} \|$. Bearing vector of pose-only reprojection residual $\| \boldsymbol{V}_{ijk,bearing}^{PA,j} \|$ represents chord distance between $\vec{\boldsymbol{X}}_{ijk}^j$ and $\vec{\tilde{\boldsymbol{X}}}_{jk}$ of unit sphere centered at camera origin.
  • Figure 3: Generation mechanism of $\boldsymbol{V}_{ik}^{{GRRM}}$ (using three views as an example). $\boldsymbol{X}_{ijk}^j$ and $\boldsymbol{X}_{ilk}^i$ represent pose-only reprojection coordinates on camera $i$ of observations $\tilde{\boldsymbol{X}}_{jk}$ and $\tilde{\boldsymbol{X}}_{lk}$ from cameras $j$ and $l$, respectively. Pose-only reprojection residual on imaging plane of camera $i$ and its corresponding bearing vector on unit sphere are illustrated by orange and red lines, respectively. $\boldsymbol{V}_{ik}^{{GRRM}}$ is formed through a linear weighted summation of a series of such pose-only reprojection residuals.
  • Figure 4: (a)-(e) Scene structures in simulation, where cameras are depicted as red models, 3D points distribute throughout space are represented as gray dots, and 3D points observed by cameras are highlighted in green. (f) Recognition rate of our proposed algorithm
  • Figure 5: Detection value of Castle sub-dataset of Strecha dataset. (a)(b) and (c)(d) differ by an order of magnitude in $v_{\text{rs}}$. Red and yellow points represent matched feature points in left and right views, respectively.
  • ...and 4 more figures

Theorems & Definitions (12)

  • Theorem 1
  • proof
  • Corollary 1
  • proof
  • Corollary 2
  • proof
  • Corollary 3
  • proof
  • Corollary 4
  • proof
  • ...and 2 more