Recursive Deformable Image Registration Network with Mutual Attention
Jian-Qing Zheng, Ziyang Wang, Baoru Huang, Ngee Han Lim, Tonia Vincent, Bartlomiej W. Papiez
TL;DR
This work addresses deformable image registration (DIR) performance limits caused by the restricted receptive field of conventional multi-stage CNNs. It introduces a Recursive Mutual Attention Network (RMAn) that fuses a recursive coarse-to-fine framework with a Mutual Attention module to expand global context without extra computation, enabling progressive refinement of the dense displacement field. On lung and abdomen CT datasets, RMAn achieves state-of-the-art results for lung registration (e.g., DSC ≈ 92%, ASD ≈ 3.8 mm) and competitive performance across 9 abdominal organs, with ablation confirming the benefits of mutual attention and recursion. The findings suggest RMAn provides accurate, efficient DIR suitable for large deformations and could be extended to multi-modal registration in the future.
Abstract
Deformable image registration, estimating the spatial transformation between different images, is an important task in medical imaging. Many previous studies have used learning-based methods for multi-stage registration to perform 3D image registration to improve performance. The performance of the multi-stage approach, however, is limited by the size of the receptive field where complex motion does not occur at a single spatial scale. We propose a new registration network combining recursive network architecture and mutual attention mechanism to overcome these limitations. Compared with the state-of-the-art deep learning methods, our network based on the recursive structure achieves the highest accuracy in lung Computed Tomography (CT) data set (Dice score of 92\% and average surface distance of 3.8mm for lungs) and one of the most accurate results in abdominal CT data set with 9 organs of various sizes (Dice score of 55\% and average surface distance of 7.8mm). We also showed that adding 3 recursive networks is sufficient to achieve the state-of-the-art results without a significant increase in the inference time.
