Table of Contents
Fetching ...

Retinal IPA: Iterative KeyPoints Alignment for Multimodal Retinal Imaging

Jiacheng Wang, Hao Li, Dewei Hu, Rui Xu, Xing Yao, Yuankai K. Tao, Ipek Oguz

TL;DR

This work addresses cross-modality retinal image alignment amid substantial domain shifts by introducing RetinaIPA, a self-/semi-supervised framework that iteratively refines keypoint detections and learns cross-modality features. It combines a multi-task segmentation head, a Keypoint-Augmented feature map SSL layer, and iterative keypoint training to produce robust, modality-agnostic feature matching and registration across fundus, FA, OCT-A, and SLO images. Extensive experiments on FIRE, CF-FA, and OCT-SLO datasets demonstrate superior alignment accuracy and reduced shadowing compared with state-of-the-art detectors and detector-free methods, with ablations showing the additive benefits of each contribution. The approach offers practical impact for ultra-wide-field retinal mosaicking and multi-modal diagnostics, and the authors provide public code and weights for reuse and benchmarking.

Abstract

We propose a novel framework for retinal feature point alignment, designed for learning cross-modality features to enhance matching and registration across multi-modality retinal images. Our model draws on the success of previous learning-based feature detection and description methods. To better leverage unlabeled data and constrain the model to reproduce relevant keypoints, we integrate a keypoint-based segmentation task. It is trained in a self-supervised manner by enforcing segmentation consistency between different augmentations of the same image. By incorporating a keypoint augmented self-supervised layer, we achieve robust feature extraction across modalities. Extensive evaluation on two public datasets and one in-house dataset demonstrates significant improvements in performance for modality-agnostic retinal feature alignment. Our code and model weights are publicly available at \url{https://github.com/MedICL-VU/RetinaIPA}.

Retinal IPA: Iterative KeyPoints Alignment for Multimodal Retinal Imaging

TL;DR

This work addresses cross-modality retinal image alignment amid substantial domain shifts by introducing RetinaIPA, a self-/semi-supervised framework that iteratively refines keypoint detections and learns cross-modality features. It combines a multi-task segmentation head, a Keypoint-Augmented feature map SSL layer, and iterative keypoint training to produce robust, modality-agnostic feature matching and registration across fundus, FA, OCT-A, and SLO images. Extensive experiments on FIRE, CF-FA, and OCT-SLO datasets demonstrate superior alignment accuracy and reduced shadowing compared with state-of-the-art detectors and detector-free methods, with ablations showing the additive benefits of each contribution. The approach offers practical impact for ultra-wide-field retinal mosaicking and multi-modal diagnostics, and the authors provide public code and weights for reuse and benchmarking.

Abstract

We propose a novel framework for retinal feature point alignment, designed for learning cross-modality features to enhance matching and registration across multi-modality retinal images. Our model draws on the success of previous learning-based feature detection and description methods. To better leverage unlabeled data and constrain the model to reproduce relevant keypoints, we integrate a keypoint-based segmentation task. It is trained in a self-supervised manner by enforcing segmentation consistency between different augmentations of the same image. By incorporating a keypoint augmented self-supervised layer, we achieve robust feature extraction across modalities. Extensive evaluation on two public datasets and one in-house dataset demonstrates significant improvements in performance for modality-agnostic retinal feature alignment. Our code and model weights are publicly available at \url{https://github.com/MedICL-VU/RetinaIPA}.
Paper Structure (12 sections, 1 equation, 3 figures, 2 tables)

This paper contains 12 sections, 1 equation, 3 figures, 2 tables.

Figures (3)

  • Figure 1: The overall framework for retinal IPA. The bottom orange panel represents our keypoint-augmented (KA) layer, where we concat each layer result to compute the contrastive loss shown in the pink stacks. The dashed boxes represent the multi-tasking framework, with detection, description, and auxiliary segmentation tasks. In each iteration we leverage the current feature prediction to facilitate training.
  • Figure 2: Feature detection. First three columns: single-modality FIRE dataset. Last three columns: OCT-SLO dataset. Green stars: matched points. Blue circles: detected features. SIFT fails in both datasets. SuperRetina produces plausible results, but our model finds more matching pairs in each dataset.
  • Figure 3: Registration results. Each row is representative of a different dataset. The red channel shows the moving image after alignment ($M(I_m)$), and the green channel shows the fixed image ($I_f$). The dashed boxes provide a zoomed-in view for better visibility. We observe that our method outperforms the other two methods, which show shadowing indicating mismatched vessels.