Modality-Agnostic Structural Image Representation Learning for Deformable Multi-Modality Medical Image Registration
Tony C. W. Mok, Zi Li, Yunhao Bai, Jianpeng Zhang, Wei Liu, Yan-Jie Zhou, Ke Yan, Dakai Jin, Yu Shi, Xiaoli Yin, Le Lu, Ling Zhang
TL;DR
This work tackles the challenge of deformable multimodal medical image registration by learning modality-agnostic deep structural image representations (DSIR) through a self-supervised framework. It introduces Deep Neighbourhood Self-similarity (DNS) to capture discriminative, long-range structural information from feature maps, and anatomy-aware contrastive learning with stochastic non-linear intensity transformations to enforce modality-invariant, location-specific similarity. The MASR-Net architecture produces compact DSIRs that can drive either learning-based or instance-specific optimization-based registration, achieving state-of-the-art or competitive results across liver multiphase CT, abdomen MR-CT, and brain MR T1w-T2w tasks without requiring annotated labels or perfectly aligned training pairs. The approach demonstrates strong robustness to large deformations and modality-specific intensity variations, offering a flexible, practical solution for diverse clinical registration needs and downstream analyses.
Abstract
Establishing dense anatomical correspondence across distinct imaging modalities is a foundational yet challenging procedure for numerous medical image analysis studies and image-guided radiotherapy. Existing multi-modality image registration algorithms rely on statistical-based similarity measures or local structural image representations. However, the former is sensitive to locally varying noise, while the latter is not discriminative enough to cope with complex anatomical structures in multimodal scans, causing ambiguity in determining the anatomical correspondence across scans with different modalities. In this paper, we propose a modality-agnostic structural representation learning method, which leverages Deep Neighbourhood Self-similarity (DNS) and anatomy-aware contrastive learning to learn discriminative and contrast-invariance deep structural image representations (DSIR) without the need for anatomical delineations or pre-aligned training images. We evaluate our method on multiphase CT, abdomen MR-CT, and brain MR T1w-T2w registration. Comprehensive results demonstrate that our method is superior to the conventional local structural representation and statistical-based similarity measures in terms of discriminability and accuracy.
