CoMatch: Dynamic Covisibility-Aware Transformer for Bilateral Subpixel-Level Semi-Dense Image Matching

Zizhuo Li; Yifan Lu; Linfeng Tang; Shihua Zhang; Jiayi Ma

CoMatch: Dynamic Covisibility-Aware Transformer for Bilateral Subpixel-Level Semi-Dense Image Matching

Zizhuo Li, Yifan Lu, Linfeng Tang, Shihua Zhang, Jiayi Ma

TL;DR

CoMatch addresses the efficiency-accuracy gap in semi-dense image matching by introducing a dynamic covisibility-aware Transformer that selectively condenses and attends over tokens based on covisibility. The method couples a covisibility-guided token condenser with a covisibility-assisted attention to robustly propagate context only from covisible regions, plus a bilateral subpixel refinement stage that optimizes correspondences in both views. Through coarse-to-fine matching with a two-stage bilateral refinement and joint supervision, CoMatch achieves state-of-the-art performance on pose estimation, homography, and visual localization while maintaining competitive speed. The approach demonstrates strong cross-dataset generalization and substantial practical impact for SLAM, SfM, and localization tasks.

Abstract

This prospective study proposes CoMatch, a novel semi-dense image matcher with dynamic covisibility awareness and bilateral subpixel accuracy. Firstly, observing that modeling context interaction over the entire coarse feature map elicits highly redundant computation due to the neighboring representation similarity of tokens, a covisibility-guided token condenser is introduced to adaptively aggregate tokens in light of their covisibility scores that are dynamically estimated, thereby ensuring computational efficiency while improving the representational capacity of aggregated tokens simultaneously. Secondly, considering that feature interaction with massive non-covisible areas is distracting, which may degrade feature distinctiveness, a covisibility-assisted attention mechanism is deployed to selectively suppress irrelevant message broadcast from non-covisible reduced tokens, resulting in robust and compact attention to relevant rather than all ones. Thirdly, we find that at the fine-level stage, current methods adjust only the target view's keypoints to subpixel level, while those in the source view remain restricted at the coarse level and thus not informative enough, detrimental to keypoint location-sensitive usages. A simple yet potent fine correlation module is developed to refine the matching candidates in both source and target views to subpixel level, attaining attractive performance improvement. Thorough experimentation across an array of public benchmarks affirms CoMatch's promising accuracy, efficiency, and generalizability.

CoMatch: Dynamic Covisibility-Aware Transformer for Bilateral Subpixel-Level Semi-Dense Image Matching

TL;DR

Abstract

CoMatch: Dynamic Covisibility-Aware Transformer for Bilateral Subpixel-Level Semi-Dense Image Matching

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (6)