Table of Contents
Fetching ...

Selecting and Pruning: A Differentiable Causal Sequentialized State-Space Model for Two-View Correspondence Learning

Xiang Fang, Shihua Zhang, Hao Zhang, Tao Lu, Huabing Zhou, Jiayi Ma

TL;DR

CorrMamba addresses two-view correspondence learning by integrating differentiable causal sequence learning with a local context graph module and a channel-aware Mamba filter to selectively mine information from true correspondences while suppressing outliers. The architecture comprises a Causal Sequence Learning Block, Local Graph Pattern Learning, and Channel-Aware Mamba Filter, culminating in an inlier predictor and robust essential matrix estimation. Empirical results on outdoor relative pose estimation and visual localization demonstrate state-of-the-art performance, including a notable 2.58 percentage point improvement in AUC@20° outdoors, along with strong ablations validating each component. The work delivers a scalable, permutation-invariant, and generalizable framework that effectively prunes mismatches while preserving geometry-critical information for downstream vision tasks.

Abstract

Two-view correspondence learning aims to discern true and false correspondences between image pairs by recognizing their underlying different information. Previous methods either treat the information equally or require the explicit storage of the entire context, tending to be laborious in real-world scenarios. Inspired by Mamba's inherent selectivity, we propose \textbf{CorrMamba}, a \textbf{Corr}espondence filter leveraging \textbf{Mamba}'s ability to selectively mine information from true correspondences while mitigating interference from false ones, thus achieving adaptive focus at a lower cost. To prevent Mamba from being potentially impacted by unordered keypoints that obscured its ability to mine spatial information, we customize a causal sequential learning approach based on the Gumbel-Softmax technique to establish causal dependencies between features in a fully autonomous and differentiable manner. Additionally, a local-context enhancement module is designed to capture critical contextual cues essential for correspondence pruning, complementing the core framework. Extensive experiments on relative pose estimation, visual localization, and analysis demonstrate that CorrMamba achieves state-of-the-art performance. Notably, in outdoor relative pose estimation, our method surpasses the previous SOTA by $2.58$ absolute percentage points in AUC@20\textdegree, highlighting its practical superiority. Our code will be publicly available.

Selecting and Pruning: A Differentiable Causal Sequentialized State-Space Model for Two-View Correspondence Learning

TL;DR

CorrMamba addresses two-view correspondence learning by integrating differentiable causal sequence learning with a local context graph module and a channel-aware Mamba filter to selectively mine information from true correspondences while suppressing outliers. The architecture comprises a Causal Sequence Learning Block, Local Graph Pattern Learning, and Channel-Aware Mamba Filter, culminating in an inlier predictor and robust essential matrix estimation. Empirical results on outdoor relative pose estimation and visual localization demonstrate state-of-the-art performance, including a notable 2.58 percentage point improvement in AUC@20° outdoors, along with strong ablations validating each component. The work delivers a scalable, permutation-invariant, and generalizable framework that effectively prunes mismatches while preserving geometry-critical information for downstream vision tasks.

Abstract

Two-view correspondence learning aims to discern true and false correspondences between image pairs by recognizing their underlying different information. Previous methods either treat the information equally or require the explicit storage of the entire context, tending to be laborious in real-world scenarios. Inspired by Mamba's inherent selectivity, we propose \textbf{CorrMamba}, a \textbf{Corr}espondence filter leveraging \textbf{Mamba}'s ability to selectively mine information from true correspondences while mitigating interference from false ones, thus achieving adaptive focus at a lower cost. To prevent Mamba from being potentially impacted by unordered keypoints that obscured its ability to mine spatial information, we customize a causal sequential learning approach based on the Gumbel-Softmax technique to establish causal dependencies between features in a fully autonomous and differentiable manner. Additionally, a local-context enhancement module is designed to capture critical contextual cues essential for correspondence pruning, complementing the core framework. Extensive experiments on relative pose estimation, visual localization, and analysis demonstrate that CorrMamba achieves state-of-the-art performance. Notably, in outdoor relative pose estimation, our method surpasses the previous SOTA by absolute percentage points in AUC@20\textdegree, highlighting its practical superiority. Our code will be publicly available.

Paper Structure

This paper contains 18 sections, 19 equations, 5 figures, 6 tables.

Figures (5)

  • Figure 1: Illustration of different filters of linear complexity for correspondence learning. False matches are shown in red (—), and correct ones in green (—). The transition of the yellow hue from closer to red to closer to green in (c) signifies a progression from lower to higher weights, meaning the model is more likely to consider the match as an inlier and ignore outliers at the same time.
  • Figure 2: Framework diagram of CorrMamba. We use the putative set of correspondences obtained by off-the-shelf detectors as input and finally obtain the inlier probability of each correspondence through the network.
  • Figure 3: Architecture of Channel-Aware Mamba Filter. Note that residual connections have been omitted for visual clarity.
  • Figure 4: Qualitative illustration of outlier rejection. False matches are marked with red (—) while correct matches are with green (—). The relative pose estimation results (error of rotation and translation) are provided in the top left corner of each image pair. Please zoom in for a better view.
  • Figure 5: Visualization of Sequencialization. We sort the keypoints based on their scores, with red denoting higher rankings, i.e. higher scores, and blue indicating lower rankings, i.e. lower scores.