ConDL: Detector-Free Dense Image Matching
Monika Kwiatkowski, Simon Matern, Olaf Hellwich
TL;DR
ConDL addresses dense image matching without relying on diagnostic keypoint detectors by learning dense pixel-wise descriptors through contrastive learning on SIDAR-based synthetic distortions. It constructs dense feature maps from a ResNet, samples a regular grid of points, and uses a global similarity matrix with a cross-entropy objective to jointly optimize all correspondences. The method demonstrates robustness to perspective distortions, illumination changes, occlusions, and shadows, achieving competitive performance against detector-based and other state-of-the-art approaches on synthetic distorted data. This detector-free, modular approach offers a scalable alternative for dense correspondence estimation with potential for broader applicability and generalization.
Abstract
In this work, we introduce a deep-learning framework designed for estimating dense image correspondences. Our fully convolutional model generates dense feature maps for images, where each pixel is associated with a descriptor that can be matched across multiple images. Unlike previous methods, our model is trained on synthetic data that includes significant distortions, such as perspective changes, illumination variations, shadows, and specular highlights. Utilizing contrastive learning, our feature maps achieve greater invariance to these distortions, enabling robust matching. Notably, our method eliminates the need for a keypoint detector, setting it apart from many existing image-matching techniques.
