Table of Contents
Fetching ...

ConDL: Detector-Free Dense Image Matching

Monika Kwiatkowski, Simon Matern, Olaf Hellwich

TL;DR

ConDL addresses dense image matching without relying on diagnostic keypoint detectors by learning dense pixel-wise descriptors through contrastive learning on SIDAR-based synthetic distortions. It constructs dense feature maps from a ResNet, samples a regular grid of points, and uses a global similarity matrix with a cross-entropy objective to jointly optimize all correspondences. The method demonstrates robustness to perspective distortions, illumination changes, occlusions, and shadows, achieving competitive performance against detector-based and other state-of-the-art approaches on synthetic distorted data. This detector-free, modular approach offers a scalable alternative for dense correspondence estimation with potential for broader applicability and generalization.

Abstract

In this work, we introduce a deep-learning framework designed for estimating dense image correspondences. Our fully convolutional model generates dense feature maps for images, where each pixel is associated with a descriptor that can be matched across multiple images. Unlike previous methods, our model is trained on synthetic data that includes significant distortions, such as perspective changes, illumination variations, shadows, and specular highlights. Utilizing contrastive learning, our feature maps achieve greater invariance to these distortions, enabling robust matching. Notably, our method eliminates the need for a keypoint detector, setting it apart from many existing image-matching techniques.

ConDL: Detector-Free Dense Image Matching

TL;DR

ConDL addresses dense image matching without relying on diagnostic keypoint detectors by learning dense pixel-wise descriptors through contrastive learning on SIDAR-based synthetic distortions. It constructs dense feature maps from a ResNet, samples a regular grid of points, and uses a global similarity matrix with a cross-entropy objective to jointly optimize all correspondences. The method demonstrates robustness to perspective distortions, illumination changes, occlusions, and shadows, achieving competitive performance against detector-based and other state-of-the-art approaches on synthetic distorted data. This detector-free, modular approach offers a scalable alternative for dense correspondence estimation with potential for broader applicability and generalization.

Abstract

In this work, we introduce a deep-learning framework designed for estimating dense image correspondences. Our fully convolutional model generates dense feature maps for images, where each pixel is associated with a descriptor that can be matched across multiple images. Unlike previous methods, our model is trained on synthetic data that includes significant distortions, such as perspective changes, illumination variations, shadows, and specular highlights. Utilizing contrastive learning, our feature maps achieve greater invariance to these distortions, enabling robust matching. Notably, our method eliminates the need for a keypoint detector, setting it apart from many existing image-matching techniques.
Paper Structure (16 sections, 12 equations, 11 figures)

This paper contains 16 sections, 12 equations, 11 figures.

Figures (11)

  • Figure 1: An illustration of the ConDL framework. Dense feature maps are extracted from two images. Keypoints are differentiably sampled from the feature maps. Matches are estimated from similarity scores by calculating pairwise dot-products.
  • Figure 2: (a) shows an input image and (b)-(d) show the created data augmentations.
  • Figure 3: Illustration of sampled point correspondences.
  • Figure 4: Illustration of sampled point correspondences with added noise.
  • Figure 5: The graphs show the cumulative percentage of estimated homographies below a given Mean Corner Error.
  • ...and 6 more figures