Table of Contents
Fetching ...

Highly efficient non-rigid registration in k-space with application to cardiac Magnetic Resonance Imaging

Aya Ghoul, Kerstin Hammernik, Andreas Lingg, Patrick Krumm, Daniel Rueckert, Sergios Gatidis, Thomas Küstner

TL;DR

This work proposes a novel self-supervised deep learning-based framework, dubbed the Local-All Pass Attention Network (LAPANet), for non-rigid motion estimation directly from the acquired accelerated Fourier space, i.e. k-space, following the Local All-Pass (LAP) registration technique.

Abstract

In Magnetic Resonance Imaging (MRI), high temporal-resolved motion can be useful for image acquisition and reconstruction, MR-guided radiotherapy, dynamic contrast-enhancement, flow and perfusion imaging, and functional assessment of motion patterns in cardiovascular, abdominal, peristaltic, fetal, or musculoskeletal imaging. Conventionally, these motion estimates are derived through image-based registration, a particularly challenging task for complex motion patterns and high dynamic resolution. The accelerated scans in such applications result in imaging artifacts that compromise the motion estimation. In this work, we propose a novel self-supervised deep learning-based framework, dubbed the Local-All Pass Attention Network (LAPANet), for non-rigid motion estimation directly from the acquired accelerated Fourier space, i.e. k-space. The proposed approach models non-rigid motion as the cumulative sum of local translational displacements, following the Local All-Pass (LAP) registration technique. LAPANet was evaluated on cardiac motion estimation across various sampling trajectories and acceleration rates. Our results demonstrate superior accuracy compared to prior conventional and deep learning-based registration methods, accommodating as few as 2 lines/frame in a Cartesian trajectory and 3 spokes/frame in a non-Cartesian trajectory. The achieved high temporal resolution (less than 5 ms) for non-rigid motion opens new avenues for motion detection, tracking and correction in dynamic and real-time MRI applications.

Highly efficient non-rigid registration in k-space with application to cardiac Magnetic Resonance Imaging

TL;DR

This work proposes a novel self-supervised deep learning-based framework, dubbed the Local-All Pass Attention Network (LAPANet), for non-rigid motion estimation directly from the acquired accelerated Fourier space, i.e. k-space, following the Local All-Pass (LAP) registration technique.

Abstract

In Magnetic Resonance Imaging (MRI), high temporal-resolved motion can be useful for image acquisition and reconstruction, MR-guided radiotherapy, dynamic contrast-enhancement, flow and perfusion imaging, and functional assessment of motion patterns in cardiovascular, abdominal, peristaltic, fetal, or musculoskeletal imaging. Conventionally, these motion estimates are derived through image-based registration, a particularly challenging task for complex motion patterns and high dynamic resolution. The accelerated scans in such applications result in imaging artifacts that compromise the motion estimation. In this work, we propose a novel self-supervised deep learning-based framework, dubbed the Local-All Pass Attention Network (LAPANet), for non-rigid motion estimation directly from the acquired accelerated Fourier space, i.e. k-space. The proposed approach models non-rigid motion as the cumulative sum of local translational displacements, following the Local All-Pass (LAP) registration technique. LAPANet was evaluated on cardiac motion estimation across various sampling trajectories and acceleration rates. Our results demonstrate superior accuracy compared to prior conventional and deep learning-based registration methods, accommodating as few as 2 lines/frame in a Cartesian trajectory and 3 spokes/frame in a non-Cartesian trajectory. The achieved high temporal resolution (less than 5 ms) for non-rigid motion opens new avenues for motion detection, tracking and correction in dynamic and real-time MRI applications.

Paper Structure

This paper contains 20 sections, 7 equations, 11 figures.

Figures (11)

  • Figure 1: LAPANet architecture for non-rigid registration in k-space with exemplary Cartesian undersampling. The accelerated coil-resolved fixed ($k_{fix}$) and moving ($k_{mov}$) k-spaces are obtained by undersampling the fully sampled fixed ($k_{fix,f}$) and moving ($k_{mov,f}$) k-spaces during training. $A^{H}$ refers to the multi-coil backward operation, which is used for loss calculation $\mathcal{L}_{LAPANet}$. The real and imaginary parts of the coil-resolved $k_{fix}$ and $k_{mov}$ are concatenated (CAT) to create the input. Global Residual Modules (Fig. \ref{['Global_Residual_Module']}) extract multi-scale k-space features at $4$ levels, denoted as $L_i$. The Encoder and Decoder Blocks (Fig. \ref{['Encoding_Decoding_Block']}) extract local and global representations relying on transformer modules and convolutional operations with varying strides. The Motion Attention Modules (Fig. \ref{['Motion_Attention_Module']}) refine the motion estimation $u_i$ of the current level. We also learn the global translational motion $u_t$ that aligns the moving image to the fixed image.
  • Figure 2: Illustration of the Global Residual Module. The full-sized stacked k-spaces of shape $(C_{in}, H, W)$ are processed through a residual block connection structure to output a feature map of shape $(C_{out}, H, W)$ for the first network level $L_1$ and $(C_{out}, H/2^{(i-1)}, W/2^{(i-1)})$ for the other levels $L_i$. The cross-layer connection involves convolutional mapping. The main connection incorporates a self-attention module, replacing traditional linear projection with depthwise convolutions to preserve spatial context. Channel-wise convolution is applied for coil weighting and information storage. Next, the Attention-weighted Squeeze and Excitation Block recalibrates channel-wise responses by squeezing the global spatial information into a channel descriptor. An attention mechanism learns subsequently channel-specific weights, used to excite the original feature maps to focus dynamically on important channels.
  • Figure 3: Illustration of the encoding and decoding blocks. Encoding combines the input of shape $(C_{in}, H, W)$ with the Global Residual Module output. Decoding involves first upsampling using the nearest neighbor while integrating skip connection features. Both encoding and decoding Blocks comprise subsequently a Channel Integration Module, followed by a Dilated Fusion Module and output feature maps of shape $(C_{out}, H/2, W/2)$ and $(C_{out}, 2H, 2W)$ respectively. Sigmoid-weighted Linear Unit (SiLU) activations are utilized throughout. Additionally, a self-attention mechanism is employed by projecting the features into Queries, Keys, and Values using depthwise convolutions to preserve spatial context and leverage correlations across the channel dimension
  • Figure 4: Illustration of the Motion Attention Module. This module involves upsampling using the bilinear interpolation by a factor of 2, followed by feature encoding from $(C_{in}, 2H, 2W)$ to $(2, 2H, 2W)$. Then, a refining operation improves the current decoder estimation by combining current and previous motion estimates. Separate attention masks for the two channels (Channel 1, Channel 2) corresponding to the spatial dimensions of the image are learned. The motion estimation channels are then weighted individually using the learned attention masks for dimension-specific fine-tuning. Finally, these weighted channels are concatenated (CAT) to obtain the motion estimation at the current level.
  • Figure 5: Boxplots of normalized root-mean-square error (NRMSE), Dice scores (DSC), and Hausdorff Distances (HDD) after registration with the predicted motion from the proposed LAPANet in comparison to GMA-RAFT, VoxelMorph, and Elastix. Metrics are shown for motion estimation in the fully sampled case and at different accelerations using a Cartesian VISTA sampling on the test dataset. The symbol $*$ denotes a statistically significant difference, indicated by $P<0.05$ when compared to LAPANet. LAPANet outperformed other image-based methods, maintaining superior scores across different accelerations. Alternative methods yielded continuous performance degradation with increased accelerations. Elastix was not able to predict any motion in these highly accelerated cases and was omitted.
  • ...and 6 more figures