UFM: Unified Feature Matching Pre-training with Multi-Modal Image Assistants

Yide Di; Yun Liao; Hao Zhou; Kaijun Zhu; Qing Duan; Junhui Liu; Mingyu Lu

UFM: Unified Feature Matching Pre-training with Multi-Modal Image Assistants

Yide Di, Yun Liao, Hao Zhou, Kaijun Zhu, Qing Duan, Junhui Liu, Mingyu Lu

TL;DR

UFM tackles unified feature matching across diverse image modals by introducing a Multimodal Image Assistant (MIA) Transformer that augments a generic FFN with modal-specific assistants and shared attention mechanisms. A data augmentation pipeline and a staged pre-training strategy address data sparsity and modality imbalance, enabling effective fine-tuning for both same-modal and cross-modal tasks. The approach employs a coarse-to-fine dense matching framework with epipolar and cycle-consistency losses, achieving strong generalization across benchmarks while remaining computationally efficient. The results demonstrate competitive or superior performance on both same- and cross-modal matching, with practical implications for multimodal vision tasks and downstream applications.

Abstract

Image feature matching, a foundational task in computer vision, remains challenging for multimodal image applications, often necessitating intricate training on specific datasets. In this paper, we introduce a Unified Feature Matching pre-trained model (UFM) designed to address feature matching challenges across a wide spectrum of modal images. We present Multimodal Image Assistant (MIA) transformers, finely tunable structures adept at handling diverse feature matching problems. UFM exhibits versatility in addressing both feature matching tasks within the same modal and those across different modals. Additionally, we propose a data augmentation algorithm and a staged pre-training strategy to effectively tackle challenges arising from sparse data in specific modals and imbalanced modal datasets. Experimental results demonstrate that UFM excels in generalization and performance across various feature matching tasks. The code will be released at:https://github.com/LiaoYun0x0/UFM.

UFM: Unified Feature Matching Pre-training with Multi-Modal Image Assistants

TL;DR

Abstract

UFM: Unified Feature Matching Pre-training with Multi-Modal Image Assistants

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (13)