Table of Contents
Fetching ...

EDM: Efficient Deep Feature Matching

Xi Li, Tong Rao, Cihui Pan

TL;DR

EDM tackles the efficiency-accuracy tradeoff in detector-free feature matching by redesigning the entire pipeline: a deep, low-channel CNN backbone, Correlation Injection Module for global-to-local correlation fusion, and a lightweight axis-based refinement head for subpixel accuracy. The approach achieves competitive performance across relative pose, homography, and localization benchmarks while substantially reducing inference time, aided by efficient coarse matching and regression strategies. Key contributions include the Correlation Injection Module, the Axis-Based Regression Head with Soft Coordinate Classification, and deployment-friendly selection and loss strategies, yielding practical gains for real-time applications. The work demonstrates that comprehensive efficiency-focused redesigns in the detector-free paradigm can deliver strong accuracy without sacrificing deployability on modern hardware.

Abstract

Recent feature matching methods have achieved remarkable performance but lack efficiency consideration. In this paper, we revisit the mainstream detector-free matching pipeline and improve all its stages considering both accuracy and efficiency. We propose an Efficient Deep feature Matching network, EDM. We first adopt a deeper CNN with fewer dimensions to extract multi-level features. Then we present a Correlation Injection Module that conducts feature transformation on high-level deep features, and progressively injects feature correlations from global to local for efficient multi-scale feature aggregation, improving both speed and performance. In the refinement stage, a novel lightweight bidirectional axis-based regression head is designed to directly predict subpixel-level correspondences from latent features, avoiding the significant computational cost of explicitly locating keypoints on high-resolution local feature heatmaps. Moreover, effective selection strategies are introduced to enhance matching accuracy. Extensive experiments show that our EDM achieves competitive matching accuracy on various benchmarks and exhibits excellent efficiency, offering valuable best practices for real-world applications. The code is available at https://github.com/chicleee/EDM.

EDM: Efficient Deep Feature Matching

TL;DR

EDM tackles the efficiency-accuracy tradeoff in detector-free feature matching by redesigning the entire pipeline: a deep, low-channel CNN backbone, Correlation Injection Module for global-to-local correlation fusion, and a lightweight axis-based refinement head for subpixel accuracy. The approach achieves competitive performance across relative pose, homography, and localization benchmarks while substantially reducing inference time, aided by efficient coarse matching and regression strategies. Key contributions include the Correlation Injection Module, the Axis-Based Regression Head with Soft Coordinate Classification, and deployment-friendly selection and loss strategies, yielding practical gains for real-time applications. The work demonstrates that comprehensive efficiency-focused redesigns in the detector-free paradigm can deliver strong accuracy without sacrificing deployability on modern hardware.

Abstract

Recent feature matching methods have achieved remarkable performance but lack efficiency consideration. In this paper, we revisit the mainstream detector-free matching pipeline and improve all its stages considering both accuracy and efficiency. We propose an Efficient Deep feature Matching network, EDM. We first adopt a deeper CNN with fewer dimensions to extract multi-level features. Then we present a Correlation Injection Module that conducts feature transformation on high-level deep features, and progressively injects feature correlations from global to local for efficient multi-scale feature aggregation, improving both speed and performance. In the refinement stage, a novel lightweight bidirectional axis-based regression head is designed to directly predict subpixel-level correspondences from latent features, avoiding the significant computational cost of explicitly locating keypoints on high-resolution local feature heatmaps. Moreover, effective selection strategies are introduced to enhance matching accuracy. Extensive experiments show that our EDM achieves competitive matching accuracy on various benchmarks and exhibits excellent efficiency, offering valuable best practices for real-world applications. The code is available at https://github.com/chicleee/EDM.

Paper Structure

This paper contains 31 sections, 9 equations, 11 figures, 13 tables.

Figures (11)

  • Figure 1: Comparison of Matching Accuracy and Latency. Our method achieves competitive accuracy with lower latency. Models are evaluated on the ScanNet dataset to get AUC@$5^{\circ}$ accuracy, while the latency for an image pair with 640$\times$480 resolution is measured on a single NVIDIA 3090 GPU.
  • Figure 2: Pipeline Overview. (a) A deeper CNN backbone is adopted to extract multi-level feature maps. (b) In the Correlation Injection Module, we alternately apply self-attention and cross-attention a total of $L$ times to capture and transform the correlations between deep feature $F_{d}^{A}$ and $F_{d}^{B}$. Subsequently, two Injection Layers are employed to progressively inject feature correlations from deep to local levels. (c) After the CIM, the coarse features ${F}_{c}^{A}$ and ${F}_{c}^{B}$ are flattened and then correlated to produce the similarity matrix. To establish coarse matches, we determine the row-wise maxima in the probability matrix and select the top $K$ values among them. (d) For fine-level matching, the corresponding fine features are extracted by the indices obtained from the coarse matching process. We treat the fine features $F_{q}^{A}$ and $F_{q}^{B}$ as queries, while considering the same features but in reversed order, $F_{r}^{B}$ and $F_{r}^{A}$, as references. The query and reference features are encoded separately and then merged together. Then, a lightweight regression head is designed to estimate the reference offsets on the X and Y axes, respectively. The final matches are obtained by adding the coarse matches to their corresponding offsets.
  • Figure 3: Bidirectional Refinement. For a coarse matching pair, the center point of one grid serves as query for fine matching, and its corresponding reference point is offset from the center point in another grid, exhibiting duality.
  • Figure 4: Attention Visualization. (a) Deep correlations.The green dots represent the query points. (b) Injection weights. Significant response values usually located in detail-rich regions.
  • Figure 5: Qualitative Comparisons. Compared with LoFTR sun2021loftr and EfficientLoFTR wang2024eloftr, our method is more robust in scenarios with large viewpoint changes and repetitive semantics. The red color indicates epipolar error beyond 5$e$-4 in the normalized image coordinates.
  • ...and 6 more figures