Table of Contents
Fetching ...

Subspace Implicit Neural Representations for Real-Time Cardiac Cine MR Imaging

Wenqi Huang, Veronika Spieker, Siying Xu, Gastao Cruz, Claudia Prieto, Julia Schnabel, Kerstin Hammernik, Thomas Kuestner, Daniel Rueckert

TL;DR

This work tackles real-time cardiac cine MRI under sparse radial sampling by introducing a subspace implicit neural representation (INR) that learns separate spatial and temporal bases with two MLPs. By representing the dynamic image as the product of these bases and leveraging the Fourier slice theorem, the method avoids data binning and NUFFT, enabling continuous, spoke-wise reconstruction guided by a low-rank prior. Initialization via a GRASP-based low-resolution solution and SVD-derived bases, followed by spoke-specific fine-tuning, yields superior spatial detail and temporal fidelity, outperforming traditional NUFFT and GRASP at acceleration factors of 10–20 in both quantitative (SNR) and qualitative (edge sharpness) metrics. The approach has potential to enable high-resolution, motion-resolved cardiac imaging in real time and lays groundwork for extensions to higher-dimensional and multi-contrast MRI, with ongoing work needed to optimize speed and clinical validation.

Abstract

Conventional cardiac cine MRI methods rely on retrospective gating, which limits temporal resolution and the ability to capture continuous cardiac dynamics, particularly in patients with arrhythmias and beat-to-beat variations. To address these challenges, we propose a reconstruction framework based on subspace implicit neural representations for real-time cardiac cine MRI of continuously sampled radial data. This approach employs two multilayer perceptrons to learn spatial and temporal subspace bases, leveraging the low-rank properties of cardiac cine MRI. Initialized with low-resolution reconstructions, the networks are fine-tuned using spoke-specific loss functions to recover spatial details and temporal fidelity. Our method directly utilizes the continuously sampled radial k-space spokes during training, thereby eliminating the need for binning and non-uniform FFT. This approach achieves superior spatial and temporal image quality compared to conventional binned methods at the acceleration rate of 10 and 20, demonstrating potential for high-resolution imaging of dynamic cardiac events and enhancing diagnostic capability.

Subspace Implicit Neural Representations for Real-Time Cardiac Cine MR Imaging

TL;DR

This work tackles real-time cardiac cine MRI under sparse radial sampling by introducing a subspace implicit neural representation (INR) that learns separate spatial and temporal bases with two MLPs. By representing the dynamic image as the product of these bases and leveraging the Fourier slice theorem, the method avoids data binning and NUFFT, enabling continuous, spoke-wise reconstruction guided by a low-rank prior. Initialization via a GRASP-based low-resolution solution and SVD-derived bases, followed by spoke-specific fine-tuning, yields superior spatial detail and temporal fidelity, outperforming traditional NUFFT and GRASP at acceleration factors of 10–20 in both quantitative (SNR) and qualitative (edge sharpness) metrics. The approach has potential to enable high-resolution, motion-resolved cardiac imaging in real time and lays groundwork for extensions to higher-dimensional and multi-contrast MRI, with ongoing work needed to optimize speed and clinical validation.

Abstract

Conventional cardiac cine MRI methods rely on retrospective gating, which limits temporal resolution and the ability to capture continuous cardiac dynamics, particularly in patients with arrhythmias and beat-to-beat variations. To address these challenges, we propose a reconstruction framework based on subspace implicit neural representations for real-time cardiac cine MRI of continuously sampled radial data. This approach employs two multilayer perceptrons to learn spatial and temporal subspace bases, leveraging the low-rank properties of cardiac cine MRI. Initialized with low-resolution reconstructions, the networks are fine-tuned using spoke-specific loss functions to recover spatial details and temporal fidelity. Our method directly utilizes the continuously sampled radial k-space spokes during training, thereby eliminating the need for binning and non-uniform FFT. This approach achieves superior spatial and temporal image quality compared to conventional binned methods at the acceleration rate of 10 and 20, demonstrating potential for high-resolution imaging of dynamic cardiac events and enhancing diagnostic capability.

Paper Structure

This paper contains 14 sections, 8 equations, 3 figures, 1 table.

Figures (3)

  • Figure 1: Method overview: a) Continuous k-space sampling with tiny golden angle radial trajectory; b) Binning spoke centers to reconstruct a low-resolution image with GRASP, decomposing into spatial and temporal bases via SVD, retaining top-k components; c) Interpolating and fitting low-resolution bases to spatial and temporal networks; d) Inputting rotated spatial and accurate temporal coordinates into networks to obtain spatial and temporal bases, whose product forms rotated images, networks optimized with Eq. \ref{['eq:fine_tune']}; e) Final reconstruction by inputting a regular $x-y-t$ grid.
  • Figure 2: Comparison of reconstruction results. a) and b) show two subjects. The first row of each subfigure shows $x-y$ images at a selected cardiac phase. The second row zooms into a region of interest (ROI). The third row presents $x-t$ profiles along a chosen $y$-coordinate. Our method preserves details in the ROI (yellow arrow) and minimizes temporal blurring seen in binned methods (red arrow). The color bars indicate intensity ranges.
  • Figure 3: The first four components of the spatial and temporal bases for an example case. a) The spatial bases are shown across four columns, where the first row displays the initialized spatial bases derived from low-resolution reconstruction, the second row shows the fine-tuned representations obtained by training based on the low-resolution initialization, and the third row presents the bases directly learned without initialization. b) The temporal bases correspond to the spatial components in a), with orange lines representing the low-resolution initialization, green lines showing the fine-tuned bases, and blue lines depicting the directly learned components without initialization. The results demonstrate that with low-resolution initialization, the models successfully capture finer spatial structures and temporal details after training on non-binned k-space spokes. In contrast, without initialization, the models fail to converge effectively.