Table of Contents
Fetching ...

LDM-Morph: Latent diffusion model guided deformable image registration

Jiong Wu, Kuang Gong

TL;DR

The proposed LDM-Morph framework outperformed existing state-of-the-art CNNs- and Transformers-based registration methods regarding accuracy and topology preservation with comparable computational efficiency.

Abstract

Deformable image registration plays an essential role in various medical image tasks. Existing deep learning-based deformable registration frameworks primarily utilize convolutional neural networks (CNNs) or Transformers to learn features to predict the deformations. However, the lack of semantic information in the learned features limits the registration performance. Furthermore, the similarity metric of the loss function is often evaluated only in the pixel space, which ignores the matching of high-level anatomical features and can lead to deformation folding. To address these issues, in this work, we proposed LDM-Morph, an unsupervised deformable registration algorithm for medical image registration. LDM-Morph integrated features extracted from the latent diffusion model (LDM) to enrich the semantic information. Additionally, a latent and global feature-based cross-attention module (LGCA) was designed to enhance the interaction of semantic information from LDM and global information from multi-head self-attention operations. Finally, a hierarchical metric was proposed to evaluate the similarity of image pairs in both the original pixel space and latent-feature space, enhancing topology preservation while improving registration accuracy. Extensive experiments on four public 2D cardiac image datasets show that the proposed LDM-Morph framework outperformed existing state-of-the-art CNNs- and Transformers-based registration methods regarding accuracy and topology preservation with comparable computational efficiency. Our code is publicly available at https://github.com/wujiong-hub/LDM-Morph.

LDM-Morph: Latent diffusion model guided deformable image registration

TL;DR

The proposed LDM-Morph framework outperformed existing state-of-the-art CNNs- and Transformers-based registration methods regarding accuracy and topology preservation with comparable computational efficiency.

Abstract

Deformable image registration plays an essential role in various medical image tasks. Existing deep learning-based deformable registration frameworks primarily utilize convolutional neural networks (CNNs) or Transformers to learn features to predict the deformations. However, the lack of semantic information in the learned features limits the registration performance. Furthermore, the similarity metric of the loss function is often evaluated only in the pixel space, which ignores the matching of high-level anatomical features and can lead to deformation folding. To address these issues, in this work, we proposed LDM-Morph, an unsupervised deformable registration algorithm for medical image registration. LDM-Morph integrated features extracted from the latent diffusion model (LDM) to enrich the semantic information. Additionally, a latent and global feature-based cross-attention module (LGCA) was designed to enhance the interaction of semantic information from LDM and global information from multi-head self-attention operations. Finally, a hierarchical metric was proposed to evaluate the similarity of image pairs in both the original pixel space and latent-feature space, enhancing topology preservation while improving registration accuracy. Extensive experiments on four public 2D cardiac image datasets show that the proposed LDM-Morph framework outperformed existing state-of-the-art CNNs- and Transformers-based registration methods regarding accuracy and topology preservation with comparable computational efficiency. Our code is publicly available at https://github.com/wujiong-hub/LDM-Morph.

Paper Structure

This paper contains 27 sections, 18 equations, 10 figures, 3 tables.

Figures (10)

  • Figure 1: Overview of the proposed deformable registration framework (LDM-Morph). It comprised three main components: a latent diffusion model (LDM)-based latent feature extraction (LDM-FE) module (shown on the top, detailed in Sec. \ref{['LDM']}), a dual-stream feature learning-based encoder (detailed in Sec. \ref{['Dual']}), and a decoder. The LDM was pre-trained. Afterward, the features extracted through the encoder of the U-NET in LDM were utilized in the dual-stream feature learning-based encoder. The estimated deformation field $\phi$ was outputted with the same resolution as the original images.
  • Figure 2: The architecture of the global feature extraction module.
  • Figure 3: The architecture of the proposed latent and global feature cross-attention (LGCA) module.
  • Figure 4: Boxplot of DSC values for registration results on the CAMUS-2CH, CAMUS-4CH, ECHO, and ACDC datasets obtained from different methods. The red line denotes the median. The upper and lower boundaries of each rectangular box represent the upper and lower quartiles, respectively. The whiskers indicate the range of the data.
  • Figure 5: Visual comparison of registration methods on the ECHO dataset. The first column displays: the fixed image with segmentation overlay, the moving image with segmentation overlay, and the difference between the moving and fixed images. Subsequent columns present results from SyN, LDDMM, VoxelMorph, CycleMorph, DiffusionMorph, TransMorph, TransMatch, and the proposed LDM-Morph. The DSC values are shown in the upper left corner. From top to bottom: the warped moving images with overlaid warped segmentation, estimated deformations (in RGB), estimated deformations (as grids), and the differences between the warped moving images and the fixed image.
  • ...and 5 more figures