Table of Contents
Fetching ...

Modality-Agnostic Structural Image Representation Learning for Deformable Multi-Modality Medical Image Registration

Tony C. W. Mok, Zi Li, Yunhao Bai, Jianpeng Zhang, Wei Liu, Yan-Jie Zhou, Ke Yan, Dakai Jin, Yu Shi, Xiaoli Yin, Le Lu, Ling Zhang

TL;DR

This work tackles the challenge of deformable multimodal medical image registration by learning modality-agnostic deep structural image representations (DSIR) through a self-supervised framework. It introduces Deep Neighbourhood Self-similarity (DNS) to capture discriminative, long-range structural information from feature maps, and anatomy-aware contrastive learning with stochastic non-linear intensity transformations to enforce modality-invariant, location-specific similarity. The MASR-Net architecture produces compact DSIRs that can drive either learning-based or instance-specific optimization-based registration, achieving state-of-the-art or competitive results across liver multiphase CT, abdomen MR-CT, and brain MR T1w-T2w tasks without requiring annotated labels or perfectly aligned training pairs. The approach demonstrates strong robustness to large deformations and modality-specific intensity variations, offering a flexible, practical solution for diverse clinical registration needs and downstream analyses.

Abstract

Establishing dense anatomical correspondence across distinct imaging modalities is a foundational yet challenging procedure for numerous medical image analysis studies and image-guided radiotherapy. Existing multi-modality image registration algorithms rely on statistical-based similarity measures or local structural image representations. However, the former is sensitive to locally varying noise, while the latter is not discriminative enough to cope with complex anatomical structures in multimodal scans, causing ambiguity in determining the anatomical correspondence across scans with different modalities. In this paper, we propose a modality-agnostic structural representation learning method, which leverages Deep Neighbourhood Self-similarity (DNS) and anatomy-aware contrastive learning to learn discriminative and contrast-invariance deep structural image representations (DSIR) without the need for anatomical delineations or pre-aligned training images. We evaluate our method on multiphase CT, abdomen MR-CT, and brain MR T1w-T2w registration. Comprehensive results demonstrate that our method is superior to the conventional local structural representation and statistical-based similarity measures in terms of discriminability and accuracy.

Modality-Agnostic Structural Image Representation Learning for Deformable Multi-Modality Medical Image Registration

TL;DR

This work tackles the challenge of deformable multimodal medical image registration by learning modality-agnostic deep structural image representations (DSIR) through a self-supervised framework. It introduces Deep Neighbourhood Self-similarity (DNS) to capture discriminative, long-range structural information from feature maps, and anatomy-aware contrastive learning with stochastic non-linear intensity transformations to enforce modality-invariant, location-specific similarity. The MASR-Net architecture produces compact DSIRs that can drive either learning-based or instance-specific optimization-based registration, achieving state-of-the-art or competitive results across liver multiphase CT, abdomen MR-CT, and brain MR T1w-T2w tasks without requiring annotated labels or perfectly aligned training pairs. The approach demonstrates strong robustness to large deformations and modality-specific intensity variations, offering a flexible, practical solution for diverse clinical registration needs and downstream analyses.

Abstract

Establishing dense anatomical correspondence across distinct imaging modalities is a foundational yet challenging procedure for numerous medical image analysis studies and image-guided radiotherapy. Existing multi-modality image registration algorithms rely on statistical-based similarity measures or local structural image representations. However, the former is sensitive to locally varying noise, while the latter is not discriminative enough to cope with complex anatomical structures in multimodal scans, causing ambiguity in determining the anatomical correspondence across scans with different modalities. In this paper, we propose a modality-agnostic structural representation learning method, which leverages Deep Neighbourhood Self-similarity (DNS) and anatomy-aware contrastive learning to learn discriminative and contrast-invariance deep structural image representations (DSIR) without the need for anatomical delineations or pre-aligned training images. We evaluate our method on multiphase CT, abdomen MR-CT, and brain MR T1w-T2w registration. Comprehensive results demonstrate that our method is superior to the conventional local structural representation and statistical-based similarity measures in terms of discriminability and accuracy.
Paper Structure (40 sections, 5 equations, 14 figures, 5 tables)

This paper contains 40 sections, 5 equations, 14 figures, 5 tables.

Figures (14)

  • Figure 1: Visualization of feature similarity between the marked feature vector (red dot) of the image and all feature vectors of augmented images using the convolutional neural network without pertaining (CNN), Modality Independent Neighbourhood Descriptor (MIND), and our proposed Deep Neighbourhood Self-similarity (DNS). Our method captures the contrast invariant and high discriminability structural representation of the image, reducing the ambiguity in matching the anatomical correspondence between multimodal images.
  • Figure 2: Overview of the Modality-Agnostic Deep Structural Representation Network (MASR-Net) and anatomy-aware contrastive learning paradigm. For brevity, we visualize the 3D feature maps in a 2D aspect. Only negative pairs of the first feature vector are shown.
  • Figure 3: Quantitative results on liver multi-phase CT registration task. $X \leftarrow Y$ represents the experiment of registering the CT scan in the $Y$ phase to the CT scan in the $X$ phase of the same patient. $\uparrow$: higher is better, and $\downarrow$: lower is better. Initial: initial results without registration. The registration runtime highlighted with an asterisk is reported in CPU time (in seconds).
  • Figure 4: Example slices of resulting warped images using different registration methods on Liver multiphase CT, Abdomen MR-CT, and Brain MR T1w-T2w registration tasks. The warped anatomical segmentations are overlayed on the resulting images. Major registration artefacts are highlighted with yellow arrows. Multiphase CT: tumour (green) and liver (red). Abdomen MR-CT: spleen (green), liver (red), right kidney (yellow), and left kidney (blue). MR T1w-T2w: grey matter (green), white matter (blue) and cerebrospinal fluid (red).
  • Figure 5: Visualization of the feature similarity maps across different modalities and anatomies. Random Init.: DNS without training. DNS$_{\text{smooth}}$: DNS with Gaussian smoothing. Each heatmap shows the similarity of the marked point (red pot) on the source image to every point in the target image. The feature extracted with our method (DNS and DNS$_{\text{smooth}}$) shows high discriminability at the boundary of liver tumour ($1^{\text{st}}$ row), kidney ($2^{\text{nd}}$ row) and anatomical structure of brain MR with large intensity variation ($3^{\text{rd}}$ row).
  • ...and 9 more figures