Table of Contents
Fetching ...

A Revisit of the Normalized Eight-Point Algorithm and A Self-Supervised Deep Solution

Bin Fan, Yuchao Dai, Yongduek Seo, Mingyi He

TL;DR

This work revisits the normalized eight-point algorithm and demonstrates that perfect conditioning $\,\kappa(\hat{A})=1$ is unattainable with linear normalizations, motivating a data-driven approach. It introduces a self-supervised CNN that outputs the normalization matrices $T$ and $T'$ (with a parametric affine form $T_L$ controlled by $\alpha_1$, $\alpha_2$, and $\theta$) to improve the conditioning of the coefficient matrix used in DLT for fundamental matrix estimation, while enforcing the singularity constraint. The network is permutation-invariant and trained via a self-supervised symmetry epipolar loss that backpropagates through the SVD, enabling training without ground-truth labels. Experimental results on KITTI, TUM, and Cambridge datasets show per-sample improvements over Hartley’s normalization, with good generalization and reasonable integration with RANSAC, indicating practical benefits for robust two-view geometry pipelines. The approach offers an interpretable, data-driven alternative to hand-crafted normalization that can enhance initialization and conditioning in multi-view geometry tasks.

Abstract

The normalized eight-point algorithm has been widely viewed as the cornerstone in two-view geometry computation, where the seminal Hartley's normalization has greatly improved the performance of the direct linear transformation algorithm. A natural question is, whether there exists and how to find other normalization methods that may further improve the performance as per each input sample. In this paper, we provide a novel perspective and propose two contributions to this fundamental problem: 1) we revisit the normalized eight-point algorithm and make a theoretical contribution by presenting the existence of different and better normalization algorithms; 2) we introduce a deep convolutional neural network with a self-supervised learning strategy for normalization. Given eight pairs of correspondences, our network directly predicts the normalization matrices, thus learning to normalize each input sample. Our learning-based normalization module can be integrated with both traditional (e.g., RANSAC) and deep learning frameworks (affording good interpretability) with minimal effort. Extensive experiments on both synthetic and real images demonstrate the effectiveness of our proposed approach.

A Revisit of the Normalized Eight-Point Algorithm and A Self-Supervised Deep Solution

TL;DR

This work revisits the normalized eight-point algorithm and demonstrates that perfect conditioning is unattainable with linear normalizations, motivating a data-driven approach. It introduces a self-supervised CNN that outputs the normalization matrices and (with a parametric affine form controlled by , , and ) to improve the conditioning of the coefficient matrix used in DLT for fundamental matrix estimation, while enforcing the singularity constraint. The network is permutation-invariant and trained via a self-supervised symmetry epipolar loss that backpropagates through the SVD, enabling training without ground-truth labels. Experimental results on KITTI, TUM, and Cambridge datasets show per-sample improvements over Hartley’s normalization, with good generalization and reasonable integration with RANSAC, indicating practical benefits for robust two-view geometry pipelines. The approach offers an interpretable, data-driven alternative to hand-crafted normalization that can enhance initialization and conditioning in multi-view geometry tasks.

Abstract

The normalized eight-point algorithm has been widely viewed as the cornerstone in two-view geometry computation, where the seminal Hartley's normalization has greatly improved the performance of the direct linear transformation algorithm. A natural question is, whether there exists and how to find other normalization methods that may further improve the performance as per each input sample. In this paper, we provide a novel perspective and propose two contributions to this fundamental problem: 1) we revisit the normalized eight-point algorithm and make a theoretical contribution by presenting the existence of different and better normalization algorithms; 2) we introduce a deep convolutional neural network with a self-supervised learning strategy for normalization. Given eight pairs of correspondences, our network directly predicts the normalization matrices, thus learning to normalize each input sample. Our learning-based normalization module can be integrated with both traditional (e.g., RANSAC) and deep learning frameworks (affording good interpretability) with minimal effort. Extensive experiments on both synthetic and real images demonstrate the effectiveness of our proposed approach.
Paper Structure (12 sections, 2 theorems, 8 equations, 9 figures, 3 tables)

This paper contains 12 sections, 2 theorems, 8 equations, 9 figures, 3 tables.

Key Result

proposition thmcounterproposition

There is no pair of normalization matrices $\textit{T}'$ and $\textit{T}$ that results in $k(\hat{\textit{A}}) = 1$.

Figures (9)

  • Figure 1: Distributions of normalized image coordinates by using Hartley’s normalization algorithm (upper right) and our learning-based normalization approach (bottom right), respectively. Eight pairs of point correspondences are obtained from the two street images on the left. Note that in the right figure, the coordinate axes represent the normalized image coordinates in the horizontal and vertical directions, and the "error" refers to the symmetry epipolar distance, which can better characterize the estimation accuracy of the two-view geometry. Our approach learns a robust normalization scheme adapted to the input data, obtains a better distribution spread of the normalized point coordinates, and eventually leads to improved performance in the computation of the fundamental matrix.
  • Figure 2: Overall framework comparisons of Harley's eight-point algorithm and our learning-based eight-point algorithm, both of which support eight points as input. Our approach shows an interpretable pipeline to predict the parameters of each normalization matrix ($\alpha_1$, $\alpha_2$, and $\theta$ in particular), which is also beneficial for a more accurate estimation of the intrinsic epipolar geometry. DLT refers to the direct linear transformation and SCE refers to the singularity constraint enforcement.
  • Figure 3: Overview of our network architecture, corresponding to the CNN layer in Fig. \ref{['fig:fig2']}. Our approach estimates the parameters of the normalization matrix ($\alpha_1$, $\alpha_2$, and $\theta$ in particular). 2D convolutional layer refers to two-dimensional convolutional layer.
  • Figure 4: (a) Average pixel errors of per sample with or without optimization of the first 20 frames of sequence "06". Our direct results are almost the same as those based on Hartley with optimization. (b) Average pixel error of each sample for the different eight-point methods. We discard the input samples with an original eight-point error greater than 60 for better visualization.
  • Figure 5: Learning-based normalized distances from the origin on the left and right camera views, respectively. Hartley's normalization makes them always $\sqrt{2}$, while our approach learns a robust normalization scheme adapted to the input data.
  • ...and 4 more figures

Theorems & Definitions (4)

  • proposition thmcounterproposition
  • proof
  • proposition thmcounterproposition
  • proof