Table of Contents
Fetching ...

FLIGHT: Fibonacci Lattice-based Inference for Geometric Heading in real-Time

David Dirnfeld, Fabien Delattre, Pedro Miraldo, Erik Learned-Miller

TL;DR

A novel generalization of the Hough transform on the unit sphere (S(2)) to estimate the camera's heading, which reduces RMSE by correcting the heading during camera pose initialization and demonstrates that the proposed method is on the Pareto frontier of accuracy versus efficiency.

Abstract

Estimating camera motion from monocular video is a fundamental problem in computer vision, central to tasks such as SLAM, visual odometry, and structure-from-motion. Existing methods that recover the camera's heading under known rotation, whether from an IMU or an optimization algorithm, tend to perform well in low-noise, low-outlier conditions, but often decrease in accuracy or become computationally expensive as noise and outlier levels increase. To address these limitations, we propose a novel generalization of the Hough transform on the unit sphere (S(2)) to estimate the camera's heading. First, the method extracts correspondences between two frames and generates a great circle of directions compatible with each pair of correspondences. Then, by discretizing the unit sphere using a Fibonacci lattice as bin centers, each great circle casts votes for a range of directions, ensuring that features unaffected by noise or dynamic objects vote consistently for the correct motion direction. Experimental results on three datasets demonstrate that the proposed method is on the Pareto frontier of accuracy versus efficiency. Additionally, experiments on SLAM show that the proposed method reduces RMSE by correcting the heading during camera pose initialization.

FLIGHT: Fibonacci Lattice-based Inference for Geometric Heading in real-Time

TL;DR

A novel generalization of the Hough transform on the unit sphere (S(2)) to estimate the camera's heading, which reduces RMSE by correcting the heading during camera pose initialization and demonstrates that the proposed method is on the Pareto frontier of accuracy versus efficiency.

Abstract

Estimating camera motion from monocular video is a fundamental problem in computer vision, central to tasks such as SLAM, visual odometry, and structure-from-motion. Existing methods that recover the camera's heading under known rotation, whether from an IMU or an optimization algorithm, tend to perform well in low-noise, low-outlier conditions, but often decrease in accuracy or become computationally expensive as noise and outlier levels increase. To address these limitations, we propose a novel generalization of the Hough transform on the unit sphere (S(2)) to estimate the camera's heading. First, the method extracts correspondences between two frames and generates a great circle of directions compatible with each pair of correspondences. Then, by discretizing the unit sphere using a Fibonacci lattice as bin centers, each great circle casts votes for a range of directions, ensuring that features unaffected by noise or dynamic objects vote consistently for the correct motion direction. Experimental results on three datasets demonstrate that the proposed method is on the Pareto frontier of accuracy versus efficiency. Additionally, experiments on SLAM show that the proposed method reduces RMSE by correcting the heading during camera pose initialization.
Paper Structure (36 sections, 11 equations, 5 figures, 6 tables)

This paper contains 36 sections, 11 equations, 5 figures, 6 tables.

Figures (5)

  • Figure 1: Left. A frame of a dynamic scene from the Sintel dataset. The blue vectors show translational flows compatible with the winning Fibonacci bin $\mathbf{s}_j$ as shown by the green point. Gray vectors show translational flows that do not agree on a direction. Right. The unit sphere of the possible directions a camera can move. The red points show a sparse sampling of the unit sphere (i.e., Fibonacci bins) with our hierarchical approach. The blue great circles (corresponding to the blue flow vectors in the frame) represent great circles that vote for a unique direction, whereas the gray great circles do not.
  • Figure 2: In the case of pure translation flow, two pairs of point correspondences are needed to obtain the direction of motion. On the left, we sample two flow vectors (indicated by the red and blue arrows). On the right, we show the great circles of translation directions compatible with these flow vectors. The points of intersection are the only directions compatible with both flow vectors.
  • Figure 3: We employ a 2-stage voting scheme on the unit sphere $\mathcal{S}^2$ representation of the possible camera translation directions between two consecutive video frames using a Fibonacci lattice. Left: In the first stage, the red points represent a sparse Fibonacci lattice. The blue and gray arcs across the sphere are the great circles representing the camera motion directions compatible with a single pair of feature correspondences. Our goal is to find the point on the sphere compatible with the largest number of features. Right: In the second stage, the red points represent a dense sampling of the Fibonacci lattice in the winning region of step 1. The yellow dot shows the ground truth translation, and the purple point is the winning bin.
  • Figure 4: The relationship of the angle $\theta_{i, j}$ between the normal $\mathbf{n}_i$ of a great circle $i$ and a Fibonacci bin center $\mathbf{s}_j$. When $\theta_{i, j}$ is close to $\pi/2$, the great circle intersects the region $\mathbf{s}_j$ defined by $r$. Given the bin is approximately flat, we use the Pythagorean Theorem to solve for $b_{i, j}$. The length $a_{i, j}$ of the intersection is therefore $2b_{i, j}$ and the amount the great circle $i$ votes for $\mathbf{s}_j$.
  • Figure 5: Left: We compare the sample size $N$ vs. the runtime (ms) for FLIGHT for different values of $p \in [1\%, 5\%, 20\%, 50\%, 90\%]$. Right: We plot sample size $N$ v.s. accuracy under the same conditions.