Spatiotemporal implicit neural representation for unsupervised dynamic MRI reconstruction

Jie Feng; Ruimin Feng; Qing Wu; Zhiyong Zhang; Yuyao Zhang; Hongjiang Wei

Spatiotemporal implicit neural representation for unsupervised dynamic MRI reconstruction

Jie Feng, Ruimin Feng, Qing Wu, Zhiyong Zhang, Yuyao Zhang, Hongjiang Wei

TL;DR

Dynamic MRI reconstruction under heavy undersampling is challenging due to the need for ground-truth data for supervised learning. The paper introduces an unsupervised implicit neural representation (INR) framework that models the dynamic image sequence as a continuous function of spatiotemporal coordinates via a hash-encoded MLP, trained solely from $(k,t)$-space data with data-consistency and explicit regularizers. It demonstrates substantial gains, including up to 41.6-fold acceleration on cardiac cine data and 4× temporal super-resolution without retraining, outperforming traditional CS-based baselines. The approach generalizes across scan configurations without requiring training data, offering a practical pathway to higher temporal resolution dynamic MRI without labeled datasets.

Abstract

Supervised Deep-Learning (DL)-based reconstruction algorithms have shown state-of-the-art results for highly-undersampled dynamic Magnetic Resonance Imaging (MRI) reconstruction. However, the requirement of excessive high-quality ground-truth data hinders their applications due to the generalization problem. Recently, Implicit Neural Representation (INR) has appeared as a powerful DL-based tool for solving the inverse problem by characterizing the attributes of a signal as a continuous function of corresponding coordinates in an unsupervised manner. In this work, we proposed an INR-based method to improve dynamic MRI reconstruction from highly undersampled k-space data, which only takes spatiotemporal coordinates as inputs. Specifically, the proposed INR represents the dynamic MRI images as an implicit function and encodes them into neural networks. The weights of the network are learned from sparsely-acquired (k, t)-space data itself only, without external training datasets or prior images. Benefiting from the strong implicit continuity regularization of INR together with explicit regularization for low-rankness and sparsity, our proposed method outperforms the compared scan-specific methods at various acceleration factors. E.g., experiments on retrospective cardiac cine datasets show an improvement of 5.5 ~ 7.1 dB in PSNR for extremely high accelerations (up to 41.6-fold). The high-quality and inner continuity of the images provided by INR has great potential to further improve the spatiotemporal resolution of dynamic MRI, without the need of any training data.

Spatiotemporal implicit neural representation for unsupervised dynamic MRI reconstruction

TL;DR

-space data with data-consistency and explicit regularizers. It demonstrates substantial gains, including up to 41.6-fold acceleration on cardiac cine data and 4× temporal super-resolution without retraining, outperforming traditional CS-based baselines. The approach generalizes across scan configurations without requiring training data, offering a practical pathway to higher temporal resolution dynamic MRI without labeled datasets.

Abstract

Paper Structure (17 sections, 12 equations, 5 figures)

This paper contains 17 sections, 12 equations, 5 figures.

Introduction
Method
Dynamic MRI with regularizers
INR in dynamic MRI
Continuous mapping function with MLP and hash encoding
Loss functions
Implementation details
Experiments and results
Setup
Datasets
Performance evaluation
Reconstruction performance of the proposed method
Cardiac cine dataset
DCE liver dataset
Results of the temporal super-resolution
...and 2 more sections

Figures (5)

Figure 1: Overview of the proposed method. All the spatiotemporal coordinates are fed into hash grids and an MLP to output two-channel intensities as the real and imaginary parts of the image series. The predicted k-space data are generated with the undersampled Fourier Transform (a golden-angle radial undersampling pattern) from the reconstructed complex-valued images following Eq. \ref{['forward']}. The difference between the predicted k-space data and acquired k-space data is calculated as the data consistency loss. Two regularization terms, temporal Total Variation and low-rankness, are applied to the output image series in the loss function. The parameters in the hash grids and the MLP are updated iteratively by minimizing the loss function.
Figure 2: The reconstruction results of NUFFT, L+S, GRASP and the proposed method (from left to right) on the cardiac cine dataset with 21 and 13 spokes per frame (AF=9.9, 16). The enlarged views of the heart region are outlined by the orange boxes and the red arrows point out the structure where the proposed method gives a superior reconstruction performance. The y-t images (the 116th slice along y and temporal dimensions) are outlined by green boxes. The error maps and PSNR/SSIM metrics are shown at the bottom, respectively.
Figure 3: The comparison of the reconstruction results on the cardiac cine dataset with 8 and 5 spokes per frame, which corresponds to the acceleration factors of 26 and 41.6. Zoomed-in views of the heart chambers are outlined by orange boxes and the y-t images (the 116th slice along y and temporal dimensions) are outlined by green boxes. The difference map between the reconstructed image and ground truth and PSNR/SSIM metrics are also shown.
Figure 4: The comparison of the reconstruction results and ROI analysis among different methods on the DCE liver dataset with 34 spokes per frame (AF=11.3). (a) Reconstruction results at different contrast phases are visualized. The zoomed-in area outlined by orange boxes with the proposed method gives the best image quality with minimal noise among different methods. (b) Signal intensity-time curves of different methods are compared in aorta (AO) and portal vein (PV) areas, and the NUFFT result serves as the temporal fidelity reference.
Figure 5: (a) The pipeline of temporal super-resolution for the reconstructed dynamic MRI. For the given denser coordinates, the optimized function (Hash grids $\&$ MLP) outputs the interpolated frames. (b) The upsampled images between Frame 10 and Frame 11 of the cardiac cine dataset with 21 spokes per frame. Three equally-spaced coordinates to be generated (10.25, 10.5, 10.75) between Frame 10 and Frame 11 are fed to the network for temporal super-resolution $(4\times)$. The ground truth of Frame 10 and Frame 11, and the linear interpolated frames serve as the reference. The reference and output images at the position of Frame 10 and 11 are outlined with orange boxes. The corresponding error maps are displayed at the bottom.

Spatiotemporal implicit neural representation for unsupervised dynamic MRI reconstruction

TL;DR

Abstract

Spatiotemporal implicit neural representation for unsupervised dynamic MRI reconstruction

Authors

TL;DR

Abstract

Table of Contents

Figures (5)