Table of Contents
Fetching ...

Image Matching Filtering and Refinement by Planes and Beyond

Fabio Bellavia, Zhenjun Zhao, Luca Morelli, Fabio Remondino

TL;DR

This work targets robust image matching by filtering and refining sparse correspondences without relying on deep learning. It introduces three geometry-based modules—Multiple Overlapping Planes (MOP), Middle Homography (MiHo), and Normalized Cross-Correlation (NCC)—that operate before or alongside RANSAC to prune outliers and improve keypoint localization, even when intrinsics are unavailable. MiHo distributes patch deformations across two patches to reduce distortion, while NCC offers optional sub-pixel refinement within a canonical plane established by the plane hypotheses. Extensive evaluations across planar and non-planar datasets, with nine base pipelines and ablation studies, show that MOP-based filtering and MiHo+NCC can closely approach or exceed deep-method performance in pose accuracy, particularly when camera intrinsics are missing; results also highlight robustness and explainability of the handcrafted approach. The findings suggest meaningful potential for integrating these geometry-driven filters into future deep image matching architectures to combine interpretability with learning-based gains.

Abstract

This paper introduces a modular, non-deep learning method for filtering and refining sparse correspondences in image matching. Assuming that motion flow within the scene can be approximated by local homography transformations, matches are aggregated into overlapping clusters corresponding to virtual planes using an iterative RANSAC-based approach discarding incompatible correspondences. Moreover, the underlying planar structural design provides an explicit map between local patches associated with the matches, by which optionally refine the keypoint positions through cross-correlation template matching after the patch reprojection. Finally, to enhance robustness and fault-tolerance against violations of the piece-wise planar approximation assumption, a further strategy is designed in order to minimize the relative patch distortion in the plane reprojection by introducing an intermediate homography that projects both patches into a common plane. The proposed method is extensively evaluated on standard datasets and image matching pipelines, and compared with state-of-the-art approaches. Unlike other current comparisons, the proposed benchmark also takes into account the more general, real, and practical cases where camera intrinsics are unavailable. Experimental results demonstrate that our proposed non-deep learning, geometry-based filter is effective in presence of outliers and the optional cross-correlation refinement step is valid in the case of corner-like keypoints. Finally, this study suggests that there is still significant development potential in practical image matching solutions in the considered research direction, which could be in the future incorporated in novel deep image matching architectures.

Image Matching Filtering and Refinement by Planes and Beyond

TL;DR

This work targets robust image matching by filtering and refining sparse correspondences without relying on deep learning. It introduces three geometry-based modules—Multiple Overlapping Planes (MOP), Middle Homography (MiHo), and Normalized Cross-Correlation (NCC)—that operate before or alongside RANSAC to prune outliers and improve keypoint localization, even when intrinsics are unavailable. MiHo distributes patch deformations across two patches to reduce distortion, while NCC offers optional sub-pixel refinement within a canonical plane established by the plane hypotheses. Extensive evaluations across planar and non-planar datasets, with nine base pipelines and ablation studies, show that MOP-based filtering and MiHo+NCC can closely approach or exceed deep-method performance in pose accuracy, particularly when camera intrinsics are missing; results also highlight robustness and explainability of the handcrafted approach. The findings suggest meaningful potential for integrating these geometry-driven filters into future deep image matching architectures to combine interpretability with learning-based gains.

Abstract

This paper introduces a modular, non-deep learning method for filtering and refining sparse correspondences in image matching. Assuming that motion flow within the scene can be approximated by local homography transformations, matches are aggregated into overlapping clusters corresponding to virtual planes using an iterative RANSAC-based approach discarding incompatible correspondences. Moreover, the underlying planar structural design provides an explicit map between local patches associated with the matches, by which optionally refine the keypoint positions through cross-correlation template matching after the patch reprojection. Finally, to enhance robustness and fault-tolerance against violations of the piece-wise planar approximation assumption, a further strategy is designed in order to minimize the relative patch distortion in the plane reprojection by introducing an intermediate homography that projects both patches into a common plane. The proposed method is extensively evaluated on standard datasets and image matching pipelines, and compared with state-of-the-art approaches. Unlike other current comparisons, the proposed benchmark also takes into account the more general, real, and practical cases where camera intrinsics are unavailable. Experimental results demonstrate that our proposed non-deep learning, geometry-based filter is effective in presence of outliers and the optional cross-correlation refinement step is valid in the case of corner-like keypoints. Finally, this study suggests that there is still significant development potential in practical image matching solutions in the considered research direction, which could be in the future incorporated in novel deep image matching architectures.

Paper Structure

This paper contains 66 sections, 71 equations, 13 figures, 11 tables.

Figures (13)

  • Figure 1: MOP+MiHo clustering and filtered matches for some image pair examples. Each combination of makers and colors is associated to a unique virtual planar homography as described in Secs. \ref{['mop']}-\ref{['miho']}, while discarded matches are indicated by black diamonds. The matching pipeline employed is the the one relying on Key.Net described in Sec \ref{['base_pipeline']}. The images of the middle and bottom rows belong to MegaDepth and ScanNet respectively, described later in Sec. \ref{['dataset']}. Best viewed in color and zoomed in.
  • Figure 2: Visual comparisons between the standard planar homography, MiHo paired homographies and VSAC half homography, both detailed in Sec. \ref{['miho']}, when the images share almost the same common area. \ref{['ma']} Two corresponding images, framed in blue, are used in turn as reference in the two top rows while the other one is reprojected by a planar homography, show in the same row. In the case of MiHo, both images are warped as shown in the third row, yet the overall distortion is reduced. VSAC half homography, shown in the bottom row, works similar to MiHo. The local transformation is better highlighted in \ref{['mb']} with the related warping of the unit-square representation of the original images. Best viewed in color and zoomed in.
  • Figure 3: Visual comparisons between the standard planar homography, MiHo paired homographies and VSAC half homography, both detailed in Sec. \ref{['miho']}, when the images share only a partial common area. The same notation employed in Fig. \ref{['miho_figa']} is adopted. As shown in the second row, the homography distortion can break the plane convex hull. MiHo, depicted in the third row, deforms less in the unshared image areas than VSAC half homography, shown in the bottom row. Best viewed in color and zoomed in.
  • Figure 4: Midpoints and rotations with MiHo. The blue and red quadrilaterals are linked by a homography which is defined by the four corner correspondences, indicated by dashed gray lines. The midpoints of the corner correspondences identify the reference green quadrilateral and the derived two MiHo planar homographies with the original quadrilaterals. In the optimal MiHo configuration \ref{['ra']} the area of the quadrilateral defined by the midpoints is maximized. Incremental relative rotations within the two original quadrilaterals by $90^\circ$ decrease the above area through \ref{['rb']} to the minimum \ref{['rc']}. From the minimum further rotations increase the area through \ref{['rd']} to the maximum again. As heuristic, in the best configuration the distance of any two midpoint corners should be within the distances of the corresponding original corners of both images as detailed in Sec. \ref{['miho_rot']}. Note that for the specific example, in the worst case there is also a violation of the planar orientation constraints multiview. Best viewed in color and zoomed in.
  • Figure 5: \ref{['xa']} Average distribution of the maximum epipolar error of the SuperPoint pipeline after applying incrementally MOP+MiHo, NCC and MAGSAC on MegaDepth. \ref{['q2']} Average distribution variation after applying NCC. The mean, median and standard deviation are shown respectively as $\mu$, $\mu_e$ and $\sigma$. Notice the negative bump within 1 px and 5 px, with the increment of the left peak as well as of the right tail, which indicate that NCC refinement succeeds in most cases but sometimes increase the errors. Best viewed in color and zoomed in.
  • ...and 8 more figures