Table of Contents
Fetching ...

ModeT: Learning Deformable Image Registration via Motion Decomposition Transformer

Haiqiao Wang, Dong Ni, Yi Wang

TL;DR

This paper tackles non-rigid deformable image registration by introducing ModeT, a Motion Decomposition Transformer that explicitly models multiple motion modalities through multi-head neighborhood attention. A Competitive Weighting Module (CWM) then fuses multiple deformation sub-fields within a pyramid registration framework, enabling progressive, interpretable deformation estimation. Across Mindboggle and LPBA brain MRI datasets, ModeT achieves state-of-the-art registration performance with strong DSC and ASSD gains and very low Jacobian folding, outperforming both traditional methods and recent Transformer-based approaches. The approach advances practical non-rigid registration by disentangling motion modes, reducing computation, and producing coherent, physically plausible deformation fields.

Abstract

The Transformer structures have been widely used in computer vision and have recently made an impact in the area of medical image registration. However, the use of Transformer in most registration networks is straightforward. These networks often merely use the attention mechanism to boost the feature learning as the segmentation networks do, but do not sufficiently design to be adapted for the registration task. In this paper, we propose a novel motion decomposition Transformer (ModeT) to explicitly model multiple motion modalities by fully exploiting the intrinsic capability of the Transformer structure for deformation estimation. The proposed ModeT naturally transforms the multi-head neighborhood attention relationship into the multi-coordinate relationship to model multiple motion modes. Then the competitive weighting module (CWM) fuses multiple deformation sub-fields to generate the resulting deformation field. Extensive experiments on two public brain magnetic resonance imaging (MRI) datasets show that our method outperforms current state-of-the-art registration networks and Transformers, demonstrating the potential of our ModeT for the challenging non-rigid deformation estimation problem. The benchmarks and our code are publicly available at https://github.com/ZAX130/SmileCode.

ModeT: Learning Deformable Image Registration via Motion Decomposition Transformer

TL;DR

This paper tackles non-rigid deformable image registration by introducing ModeT, a Motion Decomposition Transformer that explicitly models multiple motion modalities through multi-head neighborhood attention. A Competitive Weighting Module (CWM) then fuses multiple deformation sub-fields within a pyramid registration framework, enabling progressive, interpretable deformation estimation. Across Mindboggle and LPBA brain MRI datasets, ModeT achieves state-of-the-art registration performance with strong DSC and ASSD gains and very low Jacobian folding, outperforming both traditional methods and recent Transformer-based approaches. The approach advances practical non-rigid registration by disentangling motion modes, reducing computation, and producing coherent, physically plausible deformation fields.

Abstract

The Transformer structures have been widely used in computer vision and have recently made an impact in the area of medical image registration. However, the use of Transformer in most registration networks is straightforward. These networks often merely use the attention mechanism to boost the feature learning as the segmentation networks do, but do not sufficiently design to be adapted for the registration task. In this paper, we propose a novel motion decomposition Transformer (ModeT) to explicitly model multiple motion modalities by fully exploiting the intrinsic capability of the Transformer structure for deformation estimation. The proposed ModeT naturally transforms the multi-head neighborhood attention relationship into the multi-coordinate relationship to model multiple motion modes. Then the competitive weighting module (CWM) fuses multiple deformation sub-fields to generate the resulting deformation field. Extensive experiments on two public brain magnetic resonance imaging (MRI) datasets show that our method outperforms current state-of-the-art registration networks and Transformers, demonstrating the potential of our ModeT for the challenging non-rigid deformation estimation problem. The benchmarks and our code are publicly available at https://github.com/ZAX130/SmileCode.
Paper Structure (12 sections, 7 equations, 5 figures, 1 table)

This paper contains 12 sections, 7 equations, 5 figures, 1 table.

Figures (5)

  • Figure 1: Illustration of the proposed deformable registration network. The encoder takes the fixed image $I_f$ and moving image $I_m$ as input to extract hierarchical features $F_1$-$F_5$ and $M_1$-$M_5$. The motion decomposition transformer (ModeT) is used to generate multiple deformation sub-fields and the competitive weighting module (CWM) fuses them. Finally the decoding pyramid outputs the total deformation field $\phi$.
  • Figure 2: Illustration of the proposed motion decomposition transformer, which employs the multi-head neighborhood attention mechanism to decompose different motion modalities. ($S=3$ in this illustration)
  • Figure 3: Illustration of the proposed competitive weighting module (CWM).
  • Figure 4: Visualized registration results from different methods on Mindboggle (top row) and LPBA (bottom row).
  • Figure 5: Visualization of the generated multi-level deformation fields ($\varphi_1$-$\varphi_5$) to register one image pair. At low-resolution levels, multiple deformation sub-fields are decomposed to effectively model different motion modalities.