Table of Contents
Fetching ...

RGM: A Robust Generalizable Matching Model

Songyan Zhang, Xinyu Sun, Hao Chen, Bo Li, Chunhua Shen

TL;DR

The generalization capacity of the proposed RGM (Robust Generalist Matching) is greatly improved by learning the matching and uncertainty estimation in a two-stage manner on the large, mixed data.

Abstract

Finding corresponding pixels within a pair of images is a fundamental computer vision task with various applications. Due to the specific requirements of different tasks like optical flow estimation and local feature matching, previous works are primarily categorized into dense matching and sparse feature matching focusing on specialized architectures along with task-specific datasets, which may somewhat hinder the generalization performance of specialized models. In this paper, we propose a deep model for sparse and dense matching, termed RGM (Robust Generalist Matching). In particular, we elaborately design a cascaded GRU module for refinement by exploring the geometric similarity iteratively at multiple scales following an additional uncertainty estimation module for sparsification. To narrow the gap between synthetic training samples and real-world scenarios, we build a new, large-scale dataset with sparse correspondence ground truth by generating optical flow supervision with greater intervals. As such, we are able to mix up various dense and sparse matching datasets, significantly improving the training diversity. The generalization capacity of our proposed RGM is greatly improved by learning the matching and uncertainty estimation in a two-stage manner on the large, mixed data. Superior performance is achieved for zero-shot matching and downstream geometry estimation across multiple datasets, outperforming the previous methods by a large margin.

RGM: A Robust Generalizable Matching Model

TL;DR

The generalization capacity of the proposed RGM (Robust Generalist Matching) is greatly improved by learning the matching and uncertainty estimation in a two-stage manner on the large, mixed data.

Abstract

Finding corresponding pixels within a pair of images is a fundamental computer vision task with various applications. Due to the specific requirements of different tasks like optical flow estimation and local feature matching, previous works are primarily categorized into dense matching and sparse feature matching focusing on specialized architectures along with task-specific datasets, which may somewhat hinder the generalization performance of specialized models. In this paper, we propose a deep model for sparse and dense matching, termed RGM (Robust Generalist Matching). In particular, we elaborately design a cascaded GRU module for refinement by exploring the geometric similarity iteratively at multiple scales following an additional uncertainty estimation module for sparsification. To narrow the gap between synthetic training samples and real-world scenarios, we build a new, large-scale dataset with sparse correspondence ground truth by generating optical flow supervision with greater intervals. As such, we are able to mix up various dense and sparse matching datasets, significantly improving the training diversity. The generalization capacity of our proposed RGM is greatly improved by learning the matching and uncertainty estimation in a two-stage manner on the large, mixed data. Superior performance is achieved for zero-shot matching and downstream geometry estimation across multiple datasets, outperforming the previous methods by a large margin.
Paper Structure (16 sections, 9 equations, 5 figures, 7 tables)

This paper contains 16 sections, 9 equations, 5 figures, 7 tables.

Figures (5)

  • Figure 1: RGM overview. We propose a robust generalizable matching model, termed RGM. (a): RGM shows generalizable performance for both sparse and dense matching covering indoor and outdoor scenarios. (b): Our RGM achieves excellent generalization performance, outperforming previous SOTA methods by a clear margin.
  • Figure 2: Diversity of our training data. The collected data incorporates synthesized and real-world data covering indoor and outdoor scenarios (a) with various displacement distributions (b).
  • Figure 3: An overview of our proposed RGM. The learning of dense matching and uncertainty-based sparsification is decoupled into two stages as shown in the upper part while the lower part illustrates the framework of our cascaded matching network.
  • Figure 4: Visualized comparisons. Our RGM shows superior performance by obtaining more matches given a pair of indoor or outdoor images. Moreover, our method shows the potential for robust semantic correspondence as well. The same color indicates the matched features.
  • Figure 5: Visualized comparison. Given a pair of indoor or outdoor images, our RGM exhibits outstanding generalization performance as denser matches are obtained where the same color denotes the identical correspondence. Moreover, our method shows the potential for robust semantic correspondence as well.