VMambaMorph: a Multi-Modality Deformable Image Registration Framework based on Visual State Space Model with Cross-Scan Module

Ziyang Wang; Jian-Qing Zheng; Chao Ma; Tao Guo

VMambaMorph: a Multi-Modality Deformable Image Registration Framework based on Visual State Space Model with Cross-Scan Module

Ziyang Wang, Jian-Qing Zheng, Chao Ma, Tao Guo

TL;DR

This work tackles multi-modality 3D deformable image registration by integrating Visual State Space Model (VMamba) blocks into a CNN-based registration framework (VMambaMorph). It introduces a recursive coarse-to-fine registration strategy with residual updates $\phi_k=\phi_{k-1}\circ\varphi_k$ and a 3D Cross-Scan VMamba module to model global dependencies efficiently. A two-branch, weight-sharing fine-grained feature extractor improves cross-modality feature matching in volumetric data. On the SR-Reg brain MR-CT dataset, VMambaMorph achieves superior Dice scores compared with VoxelMorph, TransMorph, and MambaMorph, while maintaining diffeomorphism and practical runtimes; the authors also provide public code for reproducibility.

Abstract

Image registration, a critical process in medical imaging, involves aligning different sets of medical imaging data into a single unified coordinate system. Deep learning networks, such as the Convolutional Neural Network (CNN)-based VoxelMorph, Vision Transformer (ViT)-based TransMorph, and State Space Model (SSM)-based MambaMorph, have demonstrated effective performance in this domain. The recent Visual State Space Model (VMamba), which incorporates a cross-scan module with SSM, has exhibited promising improvements in modeling global-range dependencies with efficient computational cost in computer vision tasks. This paper hereby introduces an exploration of VMamba with image registration, named VMambaMorph. This novel hybrid VMamba-CNN network is designed specifically for 3D image registration. Utilizing a U-shaped network architecture, VMambaMorph computes the deformation field based on target and source volumes. The VMamba-based block with 2D cross-scan module is redesigned for 3D volumetric feature processing. To overcome the complex motion and structure on multi-modality images, we further propose a fine-tune recursive registration framework. We validate VMambaMorph using a public benchmark brain MR-CT registration dataset, comparing its performance against current state-of-the-art methods. The results indicate that VMambaMorph achieves competitive registration quality. The code for VMambaMorph with all baseline methods is available on GitHub.

VMambaMorph: a Multi-Modality Deformable Image Registration Framework based on Visual State Space Model with Cross-Scan Module

TL;DR

and a 3D Cross-Scan VMamba module to model global dependencies efficiently. A two-branch, weight-sharing fine-grained feature extractor improves cross-modality feature matching in volumetric data. On the SR-Reg brain MR-CT dataset, VMambaMorph achieves superior Dice scores compared with VoxelMorph, TransMorph, and MambaMorph, while maintaining diffeomorphism and practical runtimes; the authors also provide public code for reproducibility.

Abstract

Paper Structure (8 sections, 12 equations, 3 figures, 2 tables)

This paper contains 8 sections, 12 equations, 3 figures, 2 tables.

Introduction
Method
Visual State Space Model
Recursive Registration Framework
Enhanced Fine-grained Feature Extractor
Training Setting
Experiments
Conclusion

Figures (3)

Figure 1: The Recursive Registration Framework of VMambaMorph.
Figure 2: The Architecture of VMambaMorph, and the Details of Visual State Space Block.
Figure 3: The Training History of VoxelMorph, TransMorph, MambaMorph, and VMambaMorph with and without the Feature Extractors.

VMambaMorph: a Multi-Modality Deformable Image Registration Framework based on Visual State Space Model with Cross-Scan Module

TL;DR

Abstract

VMambaMorph: a Multi-Modality Deformable Image Registration Framework based on Visual State Space Model with Cross-Scan Module

Authors

TL;DR

Abstract

Table of Contents

Figures (3)