Retinal IPA: Iterative KeyPoints Alignment for Multimodal Retinal Imaging
Jiacheng Wang, Hao Li, Dewei Hu, Rui Xu, Xing Yao, Yuankai K. Tao, Ipek Oguz
TL;DR
This work addresses cross-modality retinal image alignment amid substantial domain shifts by introducing RetinaIPA, a self-/semi-supervised framework that iteratively refines keypoint detections and learns cross-modality features. It combines a multi-task segmentation head, a Keypoint-Augmented feature map SSL layer, and iterative keypoint training to produce robust, modality-agnostic feature matching and registration across fundus, FA, OCT-A, and SLO images. Extensive experiments on FIRE, CF-FA, and OCT-SLO datasets demonstrate superior alignment accuracy and reduced shadowing compared with state-of-the-art detectors and detector-free methods, with ablations showing the additive benefits of each contribution. The approach offers practical impact for ultra-wide-field retinal mosaicking and multi-modal diagnostics, and the authors provide public code and weights for reuse and benchmarking.
Abstract
We propose a novel framework for retinal feature point alignment, designed for learning cross-modality features to enhance matching and registration across multi-modality retinal images. Our model draws on the success of previous learning-based feature detection and description methods. To better leverage unlabeled data and constrain the model to reproduce relevant keypoints, we integrate a keypoint-based segmentation task. It is trained in a self-supervised manner by enforcing segmentation consistency between different augmentations of the same image. By incorporating a keypoint augmented self-supervised layer, we achieve robust feature extraction across modalities. Extensive evaluation on two public datasets and one in-house dataset demonstrates significant improvements in performance for modality-agnostic retinal feature alignment. Our code and model weights are publicly available at \url{https://github.com/MedICL-VU/RetinaIPA}.
