Table of Contents
Fetching ...

A 3D Cross-modal Keypoint Descriptor for MR-US Matching and Registration

Daniil Morozov, Reuben Dorent, Nazim Haouchine

Abstract

Intraoperative registration of real-time ultrasound (iUS) to preoperative Magnetic Resonance Imaging (MRI) remains an unsolved problem due to severe modality-specific differences in appearance, resolution, and field-of-view. To address this, we propose a novel 3D cross-modal keypoint descriptor for MRI-iUS matching and registration. Our approach employs a patient-specific matching-by-synthesis approach, generating synthetic iUS volumes from preoperative MRI. This enables supervised contrastive training to learn a shared descriptor space. A probabilistic keypoint detection strategy is then employed to identify anatomically salient and modality-consistent locations. During training, a curriculum-based triplet loss with dynamic hard negative mining is used to learn descriptors that are i) robust to iUS artifacts such as speckle noise and limited coverage, and ii) rotation-invariant. At inference, the method detects keypoints in MR and real iUS images and identifies sparse matches, which are then used to perform rigid registration. Our approach is evaluated using 3D MRI-iUS pairs from the ReMIND dataset. Experiments show that our approach outperforms state-of-the-art keypoint matching methods across 11 patients, with an average precision of 69.8%. For image registration, our method achieves a competitive mean Target Registration Error of 2.39 mm on the ReMIND2Reg benchmark. Compared to existing iUS-MR registration approaches, our framework is interpretable, requires no manual initialization, and shows robustness to iUS field-of-view variation. Code, data and model weights are available at https://github.com/morozovdd/CrossKEY.

A 3D Cross-modal Keypoint Descriptor for MR-US Matching and Registration

Abstract

Intraoperative registration of real-time ultrasound (iUS) to preoperative Magnetic Resonance Imaging (MRI) remains an unsolved problem due to severe modality-specific differences in appearance, resolution, and field-of-view. To address this, we propose a novel 3D cross-modal keypoint descriptor for MRI-iUS matching and registration. Our approach employs a patient-specific matching-by-synthesis approach, generating synthetic iUS volumes from preoperative MRI. This enables supervised contrastive training to learn a shared descriptor space. A probabilistic keypoint detection strategy is then employed to identify anatomically salient and modality-consistent locations. During training, a curriculum-based triplet loss with dynamic hard negative mining is used to learn descriptors that are i) robust to iUS artifacts such as speckle noise and limited coverage, and ii) rotation-invariant. At inference, the method detects keypoints in MR and real iUS images and identifies sparse matches, which are then used to perform rigid registration. Our approach is evaluated using 3D MRI-iUS pairs from the ReMIND dataset. Experiments show that our approach outperforms state-of-the-art keypoint matching methods across 11 patients, with an average precision of 69.8%. For image registration, our method achieves a competitive mean Target Registration Error of 2.39 mm on the ReMIND2Reg benchmark. Compared to existing iUS-MR registration approaches, our framework is interpretable, requires no manual initialization, and shows robustness to iUS field-of-view variation. Code, data and model weights are available at https://github.com/morozovdd/CrossKEY.

Paper Structure

This paper contains 27 sections, 2 equations, 9 figures, 4 tables.

Figures (9)

  • Figure 1: 3D cross-modal keypoint matching between MR and iUS volumes. Bottom: Matched local patches around a keypoint pair from MR and iUS images with corresponding descriptor curves showing strong similarity agreement.
  • Figure 2: Method overview.(a) Synthetic iUS volumes are generated from preoperative MRI using MMHVAE. (b) A cross-modal saliency map $P_{\text{res}}$ is constructed by aggregating keypoint statistics from synthetic iUS and MRI, then modulated by a spatial prior $M_w$. (c) A Siamese network is trained with triplet loss on multi-modal patch pairs to produce cross-modal descriptors. (d) Descriptor matching is performed using nearest-neighbor search, followed by a partial assignment between sampled keypoints in MRI and iUS. Keypoints are sampled from the learned saliency distribution in MRI and uniformly in the real iUS.
  • Figure 3: Synthetic US image generations for three different T2 MR images (One case per row) using MMHVAE dorentUnifiedCrossModalImage2024. The first column shows T2 MR; the middle columns show samples of synthetic US images generated using different combinations of T2, T1, and FLAIR with different speckles; the last column shows the ground truth US image.
  • Figure 4: Examples of MR-iUS patches showing high descriptor similarity for positive pairs (left) and low descriptor similarity (right) for negative pairs. The $d$-dimensional feature vectors were sorted according to the values of the MR descriptor $\mathbf{d}^{\text{MR}}$, highlighting the correlation between MR and iUS descriptors.
  • Figure 5: Qualitative matching results across three cases (columns). Rows 1–5 show results on slices from the 5 best-performing methods. Green lines indicate correct matches; red dots denote mismatches. Last row shows volume rendering with matching using our descriptor.
  • ...and 4 more figures