Table of Contents
Fetching ...

Recursive Deformable Image Registration Network with Mutual Attention

Jian-Qing Zheng, Ziyang Wang, Baoru Huang, Ngee Han Lim, Tonia Vincent, Bartlomiej W. Papiez

TL;DR

This work addresses deformable image registration (DIR) performance limits caused by the restricted receptive field of conventional multi-stage CNNs. It introduces a Recursive Mutual Attention Network (RMAn) that fuses a recursive coarse-to-fine framework with a Mutual Attention module to expand global context without extra computation, enabling progressive refinement of the dense displacement field. On lung and abdomen CT datasets, RMAn achieves state-of-the-art results for lung registration (e.g., DSC ≈ 92%, ASD ≈ 3.8 mm) and competitive performance across 9 abdominal organs, with ablation confirming the benefits of mutual attention and recursion. The findings suggest RMAn provides accurate, efficient DIR suitable for large deformations and could be extended to multi-modal registration in the future.

Abstract

Deformable image registration, estimating the spatial transformation between different images, is an important task in medical imaging. Many previous studies have used learning-based methods for multi-stage registration to perform 3D image registration to improve performance. The performance of the multi-stage approach, however, is limited by the size of the receptive field where complex motion does not occur at a single spatial scale. We propose a new registration network combining recursive network architecture and mutual attention mechanism to overcome these limitations. Compared with the state-of-the-art deep learning methods, our network based on the recursive structure achieves the highest accuracy in lung Computed Tomography (CT) data set (Dice score of 92\% and average surface distance of 3.8mm for lungs) and one of the most accurate results in abdominal CT data set with 9 organs of various sizes (Dice score of 55\% and average surface distance of 7.8mm). We also showed that adding 3 recursive networks is sufficient to achieve the state-of-the-art results without a significant increase in the inference time.

Recursive Deformable Image Registration Network with Mutual Attention

TL;DR

This work addresses deformable image registration (DIR) performance limits caused by the restricted receptive field of conventional multi-stage CNNs. It introduces a Recursive Mutual Attention Network (RMAn) that fuses a recursive coarse-to-fine framework with a Mutual Attention module to expand global context without extra computation, enabling progressive refinement of the dense displacement field. On lung and abdomen CT datasets, RMAn achieves state-of-the-art results for lung registration (e.g., DSC ≈ 92%, ASD ≈ 3.8 mm) and competitive performance across 9 abdominal organs, with ablation confirming the benefits of mutual attention and recursion. The findings suggest RMAn provides accurate, efficient DIR suitable for large deformations and could be extended to multi-modal registration in the future.

Abstract

Deformable image registration, estimating the spatial transformation between different images, is an important task in medical imaging. Many previous studies have used learning-based methods for multi-stage registration to perform 3D image registration to improve performance. The performance of the multi-stage approach, however, is limited by the size of the receptive field where complex motion does not occur at a single spatial scale. We propose a new registration network combining recursive network architecture and mutual attention mechanism to overcome these limitations. Compared with the state-of-the-art deep learning methods, our network based on the recursive structure achieves the highest accuracy in lung Computed Tomography (CT) data set (Dice score of 92\% and average surface distance of 3.8mm for lungs) and one of the most accurate results in abdominal CT data set with 9 organs of various sizes (Dice score of 55\% and average surface distance of 7.8mm). We also showed that adding 3 recursive networks is sufficient to achieve the state-of-the-art results without a significant increase in the inference time.
Paper Structure (19 sections, 7 equations, 5 figures, 2 tables)

This paper contains 19 sections, 7 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Proposed framework of Recursive Mutual Attention based Network, including a Siamese Encoder-Decoder structure with Mutual Attention interconnected, and the network structure detailed in Fig. \ref{['fig:modules']}, where $k\in[1,K]\cap\mathbb{Z}$ denotes the recursive index and $K\in\mathbb{Z}_+$ denotes the total recurrent number.
  • Figure 2: The subnetwork in Fig. \ref{['fig:networks']} including three main components, a Siamese Encoder consists of four pairs of Residual Downsampling (Res-down) blocks, Residual Upsampling (Res-up) block, and two Mutual Attention (MA) modules.
  • Figure 3: Qualitative example in chest CT shows our network achieves plausible registration, with a significant improvement, especially at the edge area of the left kidney and the lung.
  • Figure 4: RMANs achieve the best registration of the lung in chest CT scans as well as one of the best in the abdomen CT scans.
  • Figure 5: The registration results on chest CT using our RMANs and the baseline RCn, with varying recursive number both for training and inference, shows that, with the increase of recursive number (inference), the model with recursive number (training) 2 and 3 achieve higher accuracy and converge closely, while it get worse with recursive number (training) 1, and RMAn outperform RCN with each $K_{\rm infer}$ in terms of DSC and ASD.