ConKeD: Multiview contrastive descriptor learning for keypoint-based retinal image registration
David Rivas-Villar, Álvaro S. Hervella, José Rouco, Jorge Novo
TL;DR
ConKeD introduces a first multi-positive multi-negative contrastive learning framework for keypoint-based retinal image registration. It detects domain-specific keypoints (blood vessel crossovers and bifurcations) via heatmaps and trains dense pixelwise descriptors with multiview batches using losses such as SupCon and MP-InfoNCE, enabling data-efficient learning from limited annotations. The method achieves competitive registration performance on the FIRE benchmark while reducing pre-processing, training data, and the number of detected keypoints, and it operates in a single pass. These results demonstrate a practical, efficient approach for deep learning–based retinal image registration and highlight directions for expanding keypoint coverage and refining loss formulations.
Abstract
Retinal image registration is of utmost importance due to its wide applications in medical practice. In this context, we propose ConKeD, a novel deep learning approach to learn descriptors for retinal image registration. In contrast to current registration methods, our approach employs a novel multi-positive multi-negative contrastive learning strategy that enables the utilization of additional information from the available training samples. This makes it possible to learn high quality descriptors from limited training data. To train and evaluate ConKeD, we combine these descriptors with domain-specific keypoints, particularly blood vessel bifurcations and crossovers, that are detected using a deep neural network. Our experimental results demonstrate the benefits of the novel multi-positive multi-negative strategy, as it outperforms the widely used triplet loss technique (single-positive and single-negative) as well as the single-positive multi-negative alternative. Additionally, the combination of ConKeD with the domain-specific keypoints produces comparable results to the state-of-the-art methods for retinal image registration, while offering important advantages such as avoiding pre-processing, utilizing fewer training samples, and requiring fewer detected keypoints, among others. Therefore, ConKeD shows a promising potential towards facilitating the development and application of deep learning-based methods for retinal image registration.
