Table of Contents
Fetching ...

MambaReg: Mamba-Based Disentangled Convolutional Sparse Coding for Unsupervised Deformable Multi-Modal Image Registration

Kaiang Wen, Bin Xie, Bin Duan, Yan Yan

TL;DR

A novel Mamba-based architecture that seamlessly integrates the local feature extraction power of convolutional layers with the long-range dependency modeling capabilities of Mamba, and can effectively disentangle modality-independent features responsible for registration from modality-dependent, non-aligning features.

Abstract

Precise alignment of multi-modal images with inherent feature discrepancies poses a pivotal challenge in deformable image registration. Traditional learning-based approaches often consider registration networks as black boxes without interpretability. One core insight is that disentangling alignment features and non-alignment features across modalities bring benefits. Meanwhile, it is challenging for the prominent methods for image registration tasks, such as convolutional neural networks, to capture long-range dependencies by their local receptive fields. The methods often fail when the given image pair has a large misalignment due to the lack of effectively learning long-range dependencies and correspondence. In this paper, we propose MambaReg, a novel Mamba-based architecture that integrates Mamba's strong capability in capturing long sequences to address these challenges. With our proposed several sub-modules, MambaReg can effectively disentangle modality-independent features responsible for registration from modality-dependent, non-aligning features. By selectively attending to the relevant features, our network adeptly captures the correlation between multi-modal images, enabling focused deformation field prediction and precise image alignment. The Mamba-based architecture seamlessly integrates the local feature extraction power of convolutional layers with the long-range dependency modeling capabilities of Mamba. Experiments on public non-rigid RGB-IR image datasets demonstrate the superiority of our method, outperforming existing approaches in terms of registration accuracy and deformation field smoothness.

MambaReg: Mamba-Based Disentangled Convolutional Sparse Coding for Unsupervised Deformable Multi-Modal Image Registration

TL;DR

A novel Mamba-based architecture that seamlessly integrates the local feature extraction power of convolutional layers with the long-range dependency modeling capabilities of Mamba, and can effectively disentangle modality-independent features responsible for registration from modality-dependent, non-aligning features.

Abstract

Precise alignment of multi-modal images with inherent feature discrepancies poses a pivotal challenge in deformable image registration. Traditional learning-based approaches often consider registration networks as black boxes without interpretability. One core insight is that disentangling alignment features and non-alignment features across modalities bring benefits. Meanwhile, it is challenging for the prominent methods for image registration tasks, such as convolutional neural networks, to capture long-range dependencies by their local receptive fields. The methods often fail when the given image pair has a large misalignment due to the lack of effectively learning long-range dependencies and correspondence. In this paper, we propose MambaReg, a novel Mamba-based architecture that integrates Mamba's strong capability in capturing long sequences to address these challenges. With our proposed several sub-modules, MambaReg can effectively disentangle modality-independent features responsible for registration from modality-dependent, non-aligning features. By selectively attending to the relevant features, our network adeptly captures the correlation between multi-modal images, enabling focused deformation field prediction and precise image alignment. The Mamba-based architecture seamlessly integrates the local feature extraction power of convolutional layers with the long-range dependency modeling capabilities of Mamba. Experiments on public non-rigid RGB-IR image datasets demonstrate the superiority of our method, outperforming existing approaches in terms of registration accuracy and deformation field smoothness.

Paper Structure

This paper contains 16 sections, 8 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: The framework of the proposed MambaReg for unsupervised multi-modal deformable image registration.
  • Figure 2: The architecture of accompanying guidance network (AG-Net) pre-trained on fully aligned multi-modal image pairs.
  • Figure 3: The generation process of the ROI mask for MSU-PID Dataset.
  • Figure 4: The reconstruction process of the MSU-PID dataset. After cropping, center alignment is achieved between RGB-IR image pairs, and the surrounding container is removed from the image.
  • Figure 5: The deformable registration results on MSU-PID dataset.