Table of Contents
Fetching ...

Attention-aware non-rigid image registration for accelerated MR imaging

Aya Ghoul, Jiazhen Pan, Andreas Lingg, Jens Kübler, Patrick Krumm, Kerstin Hammernik, Daniel Rueckert, Sergios Gatidis, Thomas Küstner

TL;DR

This study introduces an attention-aware, self-supervised registration framework for non-rigid, pairwise MRI registration under high acceleration. By fusing local feature matching with a transformer-based Global Motion Aggregation module and a denoiser, the method robustly estimates motion from undersampled data and refines it through iterative GRU-based updates. When integrated into motion-compensated reconstruction, the approach yields superior image quality across Cartesian and radial trajectories at accelerations up to 16x (cardiac) and 30x (respiratory) compared with both conventional and deep-learning baselines. The results suggest significant potential for reducing scan times while preserving diagnostic fidelity, with the method remaining effective across different motion patterns and sampling schemes. Limitations include a 2D implementation and retrospective undersampling; future work aims at 3D groupwise registration and end-to-end reconstruction integration.

Abstract

Accurate motion estimation at high acceleration factors enables rapid motion-compensated reconstruction in Magnetic Resonance Imaging (MRI) without compromising the diagnostic image quality. In this work, we introduce an attention-aware deep learning-based framework that can perform non-rigid pairwise registration for fully sampled and accelerated MRI. We extract local visual representations to build similarity maps between the registered image pairs at multiple resolution levels and additionally leverage long-range contextual information using a transformer-based module to alleviate ambiguities in the presence of artifacts caused by undersampling. We combine local and global dependencies to perform simultaneous coarse and fine motion estimation. The proposed method was evaluated on in-house acquired fully sampled and accelerated data of 101 patients and 62 healthy subjects undergoing cardiac and thoracic MRI. The impact of motion estimation accuracy on the downstream task of motion-compensated reconstruction was analyzed. We demonstrate that our model derives reliable and consistent motion fields across different sampling trajectories (Cartesian and radial) and acceleration factors of up to 16x for cardiac motion and 30x for respiratory motion and achieves superior image quality in motion-compensated reconstruction qualitatively and quantitatively compared to conventional and recent deep learning-based approaches. The code is publicly available at https://github.com/lab-midas/GMARAFT.

Attention-aware non-rigid image registration for accelerated MR imaging

TL;DR

This study introduces an attention-aware, self-supervised registration framework for non-rigid, pairwise MRI registration under high acceleration. By fusing local feature matching with a transformer-based Global Motion Aggregation module and a denoiser, the method robustly estimates motion from undersampled data and refines it through iterative GRU-based updates. When integrated into motion-compensated reconstruction, the approach yields superior image quality across Cartesian and radial trajectories at accelerations up to 16x (cardiac) and 30x (respiratory) compared with both conventional and deep-learning baselines. The results suggest significant potential for reducing scan times while preserving diagnostic fidelity, with the method remaining effective across different motion patterns and sampling schemes. Limitations include a 2D implementation and retrospective undersampling; future work aims at 3D groupwise registration and end-to-end reconstruction integration.

Abstract

Accurate motion estimation at high acceleration factors enables rapid motion-compensated reconstruction in Magnetic Resonance Imaging (MRI) without compromising the diagnostic image quality. In this work, we introduce an attention-aware deep learning-based framework that can perform non-rigid pairwise registration for fully sampled and accelerated MRI. We extract local visual representations to build similarity maps between the registered image pairs at multiple resolution levels and additionally leverage long-range contextual information using a transformer-based module to alleviate ambiguities in the presence of artifacts caused by undersampling. We combine local and global dependencies to perform simultaneous coarse and fine motion estimation. The proposed method was evaluated on in-house acquired fully sampled and accelerated data of 101 patients and 62 healthy subjects undergoing cardiac and thoracic MRI. The impact of motion estimation accuracy on the downstream task of motion-compensated reconstruction was analyzed. We demonstrate that our model derives reliable and consistent motion fields across different sampling trajectories (Cartesian and radial) and acceleration factors of up to 16x for cardiac motion and 30x for respiratory motion and achieves superior image quality in motion-compensated reconstruction qualitatively and quantitatively compared to conventional and recent deep learning-based approaches. The code is publicly available at https://github.com/lab-midas/GMARAFT.
Paper Structure (20 sections, 7 equations, 7 figures, 5 tables)

This paper contains 20 sections, 7 equations, 7 figures, 5 tables.

Figures (7)

  • Figure 1: Illustration of the proposed framework for non-rigid image registration with exemplary Cartesian VISTA undersampling. Local feature maps are extracted from the undersampled magnitude images $I_{t1}$ and $I_{t2}$ obtained by undersampling the fully sampled images $I_{t1,f}$ and $I_{t2,f}$. $A$ and $A^{H}$ refer to the multi-coil forward and backward operators used for the Cartesian undersampling. Visual correspondences are calculated as the inner product between all pairs of the encoded features of size $W\times H$, pooled to multiple levels to obtain multi-scale tensors $C_{1\text{-}4}$ and then encoded into the motion features. The denoiser smoothes the context images and outputs the denoised images that are encoded into the context features. High-level context is obtained using self-attention embedded in the Global Motion Aggregation module (GMA). $W_\text{Q}$, $W_\text{K}$ and $W_\text{V}$ are projection heads of the queries, keys and values. The GRU decodes for $N$ iterations the flow initialization $u_{init,i}=u_{i-1}$ and the collected feature maps into update directions $\Delta u_{1\text{-}N}$ to refine the motion estimates $u_{1\text{-}N}$.
  • Figure 2: Quantitative evaluation between registered segmentation masks and manually annotated masks over the fully sampled and retrospectively undersampled acquisitions with VISTA (Cartesian) and radial (non-Cartesian) undersampling for $R=8$ and $R=16$ accelerations using the Dice loss for left ventricle (LV) and right ventricle (RV) with motion estimation using Elastixklein2009elastix, VoxelMorphbalakrishnan2019voxelmorph, Vit-V-Netchen2021vit, TransMorphchen2022transmorph, XMorphershi2022xmorpher and the proposed method. Our model outperforms the other works on observed structures in terms of Dice score. The symbol $*$ denotes a statistically significant difference, indicated by a p-value of less than 0.05 when compared to our method.
  • Figure 3: Cardiac motion estimation by the proposed framework compared to Elastixklein2009elastix, VoxelMorphbalakrishnan2019voxelmorph, Vit-V-Netchen2021vit, TransMorphchen2022transmorph and XMorphershi2022xmorpher in a healthy subject. Predicted flow fields are shown for the fully sampled and retrospectively undersampled acquisitions with VISTA (Cartesian) and radial (non-Cartesian) undersampling for $R\!=\!16$ acceleration. To demonstrate the estimated motion direction and amplitude, motion estimations are color-encoded following baker2011database and in the second row of panels depicted as quiver plot overlays on the fully sampled moving image. Our model demonstrated consistent and superior performance compared to other methods.
  • Figure 4: Respiratory motion estimation in a patient with neuroendocrine tumor and unconfirmed liver metastasis for the fully sampled and Cartesian vdPD accelerated acquisitions with accelerations $R\!=\!16$ and $R\!=\!30$. Deformation fields are overlaid on the moving image. Images of motion-compensated reconstructions are depicted next to the used color-encoded deformation fields indicating motion from end-expiratory to end-inspiratory state. Consistent performance over different accelerations was observed.
  • Figure 5: Motion-compensated reconstructions and respective error maps between the fully sampled reference image and reconstructions using different neighboring frames $T$ in a healthy subject. Motion estimations were obtained from Cartesian VISTA accelerated acquisitions with $R\!=\!16$ using the proposed model (top) and VoxelMorph (bottom).
  • ...and 2 more figures