ModeTv2: GPU-accelerated Motion Decomposition Transformer for Pairwise Optimization in Medical Image Registration

Haiqiao Wang; Zhuoyuan Wang; Dong Ni; Yi Wang

ModeTv2: GPU-accelerated Motion Decomposition Transformer for Pairwise Optimization in Medical Image Registration

Haiqiao Wang, Zhuoyuan Wang, Dong Ni, Yi Wang

TL;DR

ModeTv2 tackles deformable image registration by delivering a GPU-accelerated, interpretable operator that enables pairwise optimization with high accuracy and efficiency. It introduces a pyramid-directed architecture combining a GPU-accelerated Motion Decomposition Transformer (ModeT) with a lightweight RegHead to fuse multiple motion subfields into the total deformation field $φ$, optionally via a diffeomorphic layer. The approach achieves state-of-the-art registration performance and fast convergence across four public datasets, with strong pairwise optimization in same-domain and cross-domain settings, thanks to CUDA-accelerated computations and inductive biases aligned with registration tasks. This work enhances usability and generalization of DL-based DIR, offering a practical, scalable solution for clinical image registration and potential extensions to multi-modal scenarios.

Abstract

Deformable image registration plays a crucial role in medical imaging, aiding in disease diagnosis and image-guided interventions. Traditional iterative methods are slow, while deep learning (DL) accelerates solutions but faces usability and precision challenges. This study introduces a pyramid network with the enhanced motion decomposition Transformer (ModeTv2) operator, showcasing superior pairwise optimization (PO) akin to traditional methods. We re-implement ModeT operator with CUDA extensions to enhance its computational efficiency. We further propose RegHead module which refines deformation fields, improves the realism of deformation and reduces parameters. By adopting the PO, the proposed network balances accuracy, efficiency, and generalizability. Extensive experiments on three public brain MRI datasets and one abdominal CT dataset demonstrate the network's suitability for PO, providing a DL model with enhanced usability and interpretability. The code is publicly available at https://github.com/ZAX130/ModeTv2.

ModeTv2: GPU-accelerated Motion Decomposition Transformer for Pairwise Optimization in Medical Image Registration

TL;DR

, optionally via a diffeomorphic layer. The approach achieves state-of-the-art registration performance and fast convergence across four public datasets, with strong pairwise optimization in same-domain and cross-domain settings, thanks to CUDA-accelerated computations and inductive biases aligned with registration tasks. This work enhances usability and generalization of DL-based DIR, offering a practical, scalable solution for clinical image registration and potential extensions to multi-modal scenarios.

Abstract

Paper Structure (34 sections, 7 equations, 12 figures, 5 tables)

This paper contains 34 sections, 7 equations, 12 figures, 5 tables.

Introduction
Related Work
Traditional Registration Methods
Structures of Deep Registration
Single Stage
Cascading and Recurrent Structure
Pyramid Structure
Components of Deep Registration
Convolution
Self-Attention
Cross-Attention
Generalization Studies in Deep Registration
Method
Network Overview
Encoder
...and 19 more sections

Figures (12)

Figure 1: Illustration of the deformable image registration. Given a pair of fixed image (a) and moving image (b), the deformable registration is to estimate a non-rigid deformation field (c) to warp the moving image (d) to match with the fixed one.
Figure 2: Illustration of the proposed deformable registration network. The encoder takes the fixed image $I_f$ and moving image $I_m$ as input to extract hierarchical features $F_1$-$F_5$ and $M_1$-$M_5$. The ModeTv2 consists of a GPU-accelerated motion decomposition Transformer (ModeT) and a registration head (RegHead). The ModeTv2 is used to generate multiple deformation subfields and then fuses them. Finally the decoding pyramid outputs the total deformation field $\phi$.
Figure 3: Illustration of the computation process for the residual subfield $\varphi_p$ at position $p=(i,j)$. Subfigure (a) illustrates the potential limitation of our previous ModeT, while subfigure (b) showcases the computation process of ModeTv2. (For the ease of understanding, we show the operations in 2D here, and $S=3$ in this illustration)
Figure 4: Box plots showing the distributions of DSC scores of four regions on the ABCT dataset produced by different registration methods.
Figure 5: Box plots showing the distributions of DSC scores of seven regions on the LPBA dataset produced by different registration methods.
...and 7 more figures

ModeTv2: GPU-accelerated Motion Decomposition Transformer for Pairwise Optimization in Medical Image Registration

TL;DR

Abstract

ModeTv2: GPU-accelerated Motion Decomposition Transformer for Pairwise Optimization in Medical Image Registration

Authors

TL;DR

Abstract

Table of Contents

Figures (12)