Table of Contents
Fetching ...

Dynamic-Aware Spatio-temporal Representation Learning for Dynamic MRI Reconstruction

Dayoung Baik, Jaejun Yoo

TL;DR

This work tackles the challenge of unsupervised dynamic MRI reconstruction under severe undersampling. It introduces Dynamic-Aware INR (DA-INR), a deformation-augmented, hash-encoded INR that operates in a canonical space to capture temporal redundancy without requiring hand-tuned regularization terms. DA-INR combines a deformation network, a pretrained feature extractor, and a hash-encoded canonical network to predict complex-valued frames, achieving faster convergence and improved reconstruction quality on cardiac cine and DCE liver data. The approach yields state-of-the-art results under various undersampling conditions while reducing GPU memory usage, highlighting its practical impact for efficient dynamic MRI reconstruction in clinical settings.

Abstract

Dynamic MRI reconstruction, one of inverse problems, has seen a surge by the use of deep learning techniques. Especially, the practical difficulty of obtaining ground truth data has led to the emergence of unsupervised learning approaches. A recent promising method among them is implicit neural representation (INR), which defines the data as a continuous function that maps coordinate values to the corresponding signal values. This allows for filling in missing information only with incomplete measurements and solving the inverse problem effectively. Nevertheless, previous works incorporating this method have faced drawbacks such as long optimization time and the need for extensive hyperparameter tuning. To address these issues, we propose Dynamic-Aware INR (DA-INR), an INR-based model for dynamic MRI reconstruction that captures the spatial and temporal continuity of dynamic MRI data in the image domain and explicitly incorporates the temporal redundancy of the data into the model structure. As a result, DA-INR outperforms other models in reconstruction quality even at extreme undersampling ratios while significantly reducing optimization time and requiring minimal hyperparameter tuning.

Dynamic-Aware Spatio-temporal Representation Learning for Dynamic MRI Reconstruction

TL;DR

This work tackles the challenge of unsupervised dynamic MRI reconstruction under severe undersampling. It introduces Dynamic-Aware INR (DA-INR), a deformation-augmented, hash-encoded INR that operates in a canonical space to capture temporal redundancy without requiring hand-tuned regularization terms. DA-INR combines a deformation network, a pretrained feature extractor, and a hash-encoded canonical network to predict complex-valued frames, achieving faster convergence and improved reconstruction quality on cardiac cine and DCE liver data. The approach yields state-of-the-art results under various undersampling conditions while reducing GPU memory usage, highlighting its practical impact for efficient dynamic MRI reconstruction in clinical settings.

Abstract

Dynamic MRI reconstruction, one of inverse problems, has seen a surge by the use of deep learning techniques. Especially, the practical difficulty of obtaining ground truth data has led to the emergence of unsupervised learning approaches. A recent promising method among them is implicit neural representation (INR), which defines the data as a continuous function that maps coordinate values to the corresponding signal values. This allows for filling in missing information only with incomplete measurements and solving the inverse problem effectively. Nevertheless, previous works incorporating this method have faced drawbacks such as long optimization time and the need for extensive hyperparameter tuning. To address these issues, we propose Dynamic-Aware INR (DA-INR), an INR-based model for dynamic MRI reconstruction that captures the spatial and temporal continuity of dynamic MRI data in the image domain and explicitly incorporates the temporal redundancy of the data into the model structure. As a result, DA-INR outperforms other models in reconstruction quality even at extreme undersampling ratios while significantly reducing optimization time and requiring minimal hyperparameter tuning.
Paper Structure (29 sections, 10 equations, 9 figures, 5 tables)

This paper contains 29 sections, 10 equations, 9 figures, 5 tables.

Figures (9)

  • Figure 1: DA-INR model architecture. A deformation network $\Psi_t$ takes a spatio-temporal coordinate $(x,y,t)$ as input to output deformation field $\Delta \mathbf{x}=(\Delta x, \Delta y)$ based on a canonical space. A pretrained feature extractor extracts features from an undersampled data in the image domain. A canonical network $\Psi_x$ takes the deformed coordinate $\mathbf{x}'$ and the features $\mathbf{f}'$ to predict $t^{th}$ frame in the image domain, $d_\theta$. These two models are optimized by L1 loss computation in the frequency domain with Non-uniform Fast Fourier Transform (NuFFT).
  • Figure 2: Visual comparison of (a) Feng et al. and (b) DA-INR. Feng et al. represents dynamic MRI data as a three dimensional measurement with $x$-axis, $y$-axis, and $t$-axis and each coordinate $(x,y,t)$ directly maps to each cell in the image at time $t$. In contrast, in DA-INR, the cells of the image in the canonical space plays a regularization role to those of all other frames. The purplish lines between frame-by-frame in (b) indicate that DA-INR is continuous in time, but does not merely represent dynamic MRI data as 3D mass.
  • Figure 3: Sensitivity analysis of Feng et al. feng2023spatiotemporal on cardiac cine data. "R2" denotes the use of relative L2 loss relativel2loss for data-consistency optimization, while "L1" refers to the use of L1 loss. $\lambda_L$ and $\lambda_S$ are weights for low rank and temporal TV regularization, respectively. Lastly, "$step>20$" denotes that temporal TV regularization is turned on after training step $20$.
  • Figure 4: Visual comparisons between results of $AF=9.8$ in cardiac cine data reconstruction at diastole and systole. The upper row is the reconstruction output in the $(y-x)$ domain for each method and the below row is the absolute error map between ground truth and the reconstructed output of each method. PSNR values are specific to each frame.
  • Figure 5: Visual comparisons between results of $AF=25.6$ in cardiac cine data reconstruction at diastole and systole. The upper row is the reconstruction output in the $(y-x)$ domain for each method and the below row is the absolute error map between ground truth and the reconstructed output of each method. PSNR values are specific to each frame.
  • ...and 4 more figures