Table of Contents
Fetching ...

Monte Carlo Diffusion for Generalizable Learning-Based RANSAC

Jiale Wang, Chen Zhao, Wei Ke, Tong Zhang

TL;DR

This work tackles the limited generalization of learning-based RANSAC for robust feature matching by introducing a diffusion-based training paradigm augmented with Monte Carlo sampling. Ground-truth correspondences are progressively diffused to generate diverse noisy variants, and multi-stage randomization expands distribution coverage, enabling RANSAC learners to generalize across unseen data sources. Experiments on ScanNet and MegaDepth show significant improvements in out-of-distribution generalization for NG-RANSAC and related methods, with ablations confirming the contributions of diffusion, MSR, and compatibility with multiple RANSAC variants. The approach maintains competitive in-distribution performance and offers a practical path to robust estimation when encountering data from unfamiliar matchers or environments.

Abstract

Random Sample Consensus (RANSAC) is a fundamental approach for robustly estimating parametric models from noisy data. Existing learning-based RANSAC methods utilize deep learning to enhance the robustness of RANSAC against outliers. However, these approaches are trained and tested on the data generated by the same algorithms, leading to limited generalization to out-of-distribution data during inference. Therefore, in this paper, we introduce a novel diffusion-based paradigm that progressively injects noise into ground-truth data, simulating the noisy conditions for training learning-based RANSAC. To enhance data diversity, we incorporate Monte Carlo sampling into the diffusion paradigm, approximating diverse data distributions by introducing different types of randomness at multiple stages. We evaluate our approach in the context of feature matching through comprehensive experiments on the ScanNet and MegaDepth datasets. The experimental results demonstrate that our Monte Carlo diffusion mechanism significantly improves the generalization ability of learning-based RANSAC. We also develop extensive ablation studies that highlight the effectiveness of key components in our framework.

Monte Carlo Diffusion for Generalizable Learning-Based RANSAC

TL;DR

This work tackles the limited generalization of learning-based RANSAC for robust feature matching by introducing a diffusion-based training paradigm augmented with Monte Carlo sampling. Ground-truth correspondences are progressively diffused to generate diverse noisy variants, and multi-stage randomization expands distribution coverage, enabling RANSAC learners to generalize across unseen data sources. Experiments on ScanNet and MegaDepth show significant improvements in out-of-distribution generalization for NG-RANSAC and related methods, with ablations confirming the contributions of diffusion, MSR, and compatibility with multiple RANSAC variants. The approach maintains competitive in-distribution performance and offers a practical path to robust estimation when encountering data from unfamiliar matchers or environments.

Abstract

Random Sample Consensus (RANSAC) is a fundamental approach for robustly estimating parametric models from noisy data. Existing learning-based RANSAC methods utilize deep learning to enhance the robustness of RANSAC against outliers. However, these approaches are trained and tested on the data generated by the same algorithms, leading to limited generalization to out-of-distribution data during inference. Therefore, in this paper, we introduce a novel diffusion-based paradigm that progressively injects noise into ground-truth data, simulating the noisy conditions for training learning-based RANSAC. To enhance data diversity, we incorporate Monte Carlo sampling into the diffusion paradigm, approximating diverse data distributions by introducing different types of randomness at multiple stages. We evaluate our approach in the context of feature matching through comprehensive experiments on the ScanNet and MegaDepth datasets. The experimental results demonstrate that our Monte Carlo diffusion mechanism significantly improves the generalization ability of learning-based RANSAC. We also develop extensive ablation studies that highlight the effectiveness of key components in our framework.

Paper Structure

This paper contains 17 sections, 9 equations, 5 figures, 9 tables.

Figures (5)

  • Figure 1: Advantage of Monte Carlo diffusion. Model-1 and Model-2 denote NG-RANSAC brachmann2019neural trained on SIFT lowe2004distinctive and LoFTR sun2021loftr, respectively. The green lines indicate inliers, and the red ones are outliers. As shown in the blue box, the models trained on specific patterns show limited generalization on out-of-distribution data, e.g., Model-2 trained on LoFTR performs poorly when tested on SIFT. In contrast, we propose a diffusion-based training mechanism where training data is agnostic to specific patterns through a Monte Carlo diffusion process. NG-RANSAC trained on diffused matches demonstrates better generalization across different initial matches.
  • Figure 2: Pipeline of the diffusion process. We leverage diffusion to simulate noisy data for training learning-based RANSAC. Given ground-truth matches $\mathbf{C}_{\text{gt}}$ between two images, we randomly split them into two subsets $\mathbf{C}_{\text{gt}}^{a}$ and $\mathbf{C}_{\text{gt}}^{b}$. $\mathbf{C}_{\text{gt}}^{a}$ is processed by a Monte Carlo diffusion module with multi-stage randomization, generating multiple sets of noised matches at different timesteps. The final diffused matches are formed by combining $\mathbf{C}_{\text{gt}}^{b}$ as inliers with $\mathbf{C}_n^{b}$ sampled at timestep $t_i$ as outliers. The learning-based RANSAC is then trained on the resulting diffused matches.
  • Figure 3: Illustration of the multi-stage randomization module. We randomly sample the three hyperparameters, timestep $t$, diffusion ratio $r$, and noise scale $s$, in the diffusion mechanism. This multi-stage randomization introduces different sources of randomness into the noised matches, affecting the diffusion intensity, outlier ratio, and noise level, respectively. Invalid matches in the tentative set are replaced by randomly sampled matches, which ensures the validity of the final diffused matches.
  • Figure 4: Visualization of diffused matches. Given the same image pair, different values of diffusion ratio and noise scale result in significantly different diffused matches.
  • Figure 5: Qualitative results. Init. Matches represent the initial correspondences generated by SIFT. Baseline and Ours indicate the pruned results using NG-RANSAC trained on LoFTR and diffused matches, respectively. The green and red lines denote inliers and outliers. The baseline shows limited generalization to SIFT, which serves as out-of-distribution data, leading to many outliers after the pruning. In contrast, our method achieves significantly better generalization, identifying more inliers.