Image Matching Filtering and Refinement by Planes and Beyond
Fabio Bellavia, Zhenjun Zhao, Luca Morelli, Fabio Remondino
TL;DR
This work targets robust image matching by filtering and refining sparse correspondences without relying on deep learning. It introduces three geometry-based modules—Multiple Overlapping Planes (MOP), Middle Homography (MiHo), and Normalized Cross-Correlation (NCC)—that operate before or alongside RANSAC to prune outliers and improve keypoint localization, even when intrinsics are unavailable. MiHo distributes patch deformations across two patches to reduce distortion, while NCC offers optional sub-pixel refinement within a canonical plane established by the plane hypotheses. Extensive evaluations across planar and non-planar datasets, with nine base pipelines and ablation studies, show that MOP-based filtering and MiHo+NCC can closely approach or exceed deep-method performance in pose accuracy, particularly when camera intrinsics are missing; results also highlight robustness and explainability of the handcrafted approach. The findings suggest meaningful potential for integrating these geometry-driven filters into future deep image matching architectures to combine interpretability with learning-based gains.
Abstract
This paper introduces a modular, non-deep learning method for filtering and refining sparse correspondences in image matching. Assuming that motion flow within the scene can be approximated by local homography transformations, matches are aggregated into overlapping clusters corresponding to virtual planes using an iterative RANSAC-based approach discarding incompatible correspondences. Moreover, the underlying planar structural design provides an explicit map between local patches associated with the matches, by which optionally refine the keypoint positions through cross-correlation template matching after the patch reprojection. Finally, to enhance robustness and fault-tolerance against violations of the piece-wise planar approximation assumption, a further strategy is designed in order to minimize the relative patch distortion in the plane reprojection by introducing an intermediate homography that projects both patches into a common plane. The proposed method is extensively evaluated on standard datasets and image matching pipelines, and compared with state-of-the-art approaches. Unlike other current comparisons, the proposed benchmark also takes into account the more general, real, and practical cases where camera intrinsics are unavailable. Experimental results demonstrate that our proposed non-deep learning, geometry-based filter is effective in presence of outliers and the optional cross-correlation refinement step is valid in the case of corner-like keypoints. Finally, this study suggests that there is still significant development potential in practical image matching solutions in the considered research direction, which could be in the future incorporated in novel deep image matching architectures.
