Table of Contents
Fetching ...

Robustness of Deep Learning for Accelerated MRI: Benefits of Diverse Training Data

Kang Lin, Reinhard Heckel

TL;DR

This work addresses how training data diversity affects robustness of deep learning reconstructions for accelerated MRI under distribution shifts across anatomy, contrast, scanners, and forward-model settings. By comparing joint versus separate training across multiple architectures (U-net, ViT, VarNet) and leveraging a large, heterogeneous dataset, the authors show that diverse training yields out-of-distribution gains without sacrificing in-distribution performance, and that robustness correlates with train-test distribution similarity via CLIP-based metrics. They also reveal distributional overfitting and demonstrate pathology reconstruction and generalization from healthy data, extending findings beyond the fastMRI dataset to a broader set of 13 datasets. The practical takeaway is that one robust MRI reconstruction model trained on diverse data can outperform or match multiple specialized models, though careful training and early stopping are essential to maintain robustness while approaching real-world deployment.

Abstract

Deep learning based methods for image reconstruction are state-of-the-art for a variety of imaging tasks. However, neural networks often perform worse if the training data differs significantly from the data they are applied to. For example, a model trained for accelerated magnetic resonance imaging (MRI) on one scanner performs worse on another scanner. In this work, we investigate the impact of the training data on a model's performance and robustness for accelerated MRI. We find that models trained on the combination of various data distributions, such as those obtained from different MRI scanners and anatomies, exhibit robustness equal or superior to models trained on the best single distribution for a specific target distribution. Thus training on such diverse data tends to improve robustness. Furthermore, training on such a diverse dataset does not compromise in-distribution performance, i.e., a model trained on diverse data yields in-distribution performance at least as good as models trained on the more narrow individual distributions. Our results suggest that training a model for imaging on a variety of distributions tends to yield a more effective and robust model than maintaining separate models for individual distributions.

Robustness of Deep Learning for Accelerated MRI: Benefits of Diverse Training Data

TL;DR

This work addresses how training data diversity affects robustness of deep learning reconstructions for accelerated MRI under distribution shifts across anatomy, contrast, scanners, and forward-model settings. By comparing joint versus separate training across multiple architectures (U-net, ViT, VarNet) and leveraging a large, heterogeneous dataset, the authors show that diverse training yields out-of-distribution gains without sacrificing in-distribution performance, and that robustness correlates with train-test distribution similarity via CLIP-based metrics. They also reveal distributional overfitting and demonstrate pathology reconstruction and generalization from healthy data, extending findings beyond the fastMRI dataset to a broader set of 13 datasets. The practical takeaway is that one robust MRI reconstruction model trained on diverse data can outperform or match multiple specialized models, though careful training and early stopping are essential to maintain robustness while approaching real-world deployment.

Abstract

Deep learning based methods for image reconstruction are state-of-the-art for a variety of imaging tasks. However, neural networks often perform worse if the training data differs significantly from the data they are applied to. For example, a model trained for accelerated magnetic resonance imaging (MRI) on one scanner performs worse on another scanner. In this work, we investigate the impact of the training data on a model's performance and robustness for accelerated MRI. We find that models trained on the combination of various data distributions, such as those obtained from different MRI scanners and anatomies, exhibit robustness equal or superior to models trained on the best single distribution for a specific target distribution. Thus training on such diverse data tends to improve robustness. Furthermore, training on such a diverse dataset does not compromise in-distribution performance, i.e., a model trained on diverse data yields in-distribution performance at least as good as models trained on the more narrow individual distributions. Our results suggest that training a model for imaging on a variety of distributions tends to yield a more effective and robust model than maintaining separate models for individual distributions.
Paper Structure (40 sections, 1 equation, 22 figures, 4 tables)

This paper contains 40 sections, 1 equation, 22 figures, 4 tables.

Figures (22)

  • Figure 1: An illustrative (randomly chosen) example to demonstrate benefits of training on a large and diverse dataset: Shown are reconstructions from two VarNets sriramEndtoEndVariationalNetworks2020b, one trained on fastMRI brain, the largest single dataset of brain images for accelerated MRI, and one trained on a diverse collection of datasets $\mathcal{D}_P$. Both models are evaluated out-of-distribution on an image from the CC-359-sagittal souzaOpenMultivendorMultifieldstrength2018 dataset. The model trained on fastMRI brain shows severe artifact whereas the model trained $\mathcal{D}_P$ provides better details and fewer artifacts.
  • Figure 2: Example images for a selection of distributions from the fastMRI dataset zbontarFastMRIOpenDataset2019a we consider here. Axial view brain images are on the left, coronal view knee images are on the right. The caption above an image describes the image contrast, and the caption below is the name of the MRI scanner used.
  • Figure 3: The orange and blue bars are the U-net models trained exclusively on data from $P$ ($\mathcal{D}_P$) and $Q$ ($\mathcal{D}_Q$), respectively, and the teal bars are the models trained on both sets $\mathcal{D}_P \cup \mathcal{D}_Q$. As a reference point, the black bars are the performance of models trained on random samples of $\mathcal{D}_P \cup \mathcal{D}_Q$ of half the size. Below each bar is the total number of training images. We are in the high-data regime where increasing the dataset further gives only minor improvements. For all distributions, the joint model trained on $P$ and $Q$ performs as well on $P$ and $Q$ as the models trained individually for each of those distributions.
  • Figure 4: Training a single model (here U-net) on a slightly skewed dataset does not harm performance on the individual data distributions. The number below each bar is the number of training examples used. We report the mean $\pm$ two standard deviations from five runs, each with a different random seed for sampling training data from $P$ and model initialization. We note for training sets exceeding 3k images there is next to no variation (see Figure \ref{['fig:sep_unet_ci']} in the Appendix); therefore, we only have error bars for this experiment which includes training runs on small datasets.
  • Figure 5: For a distribution-shift from distributions $P = \{P_1, \ldots, P_m\}$ to distribution $Q$, we compare robustness of models trained on $P$ (orange) to baselines trained on the best single distribution $P_\text{best}$ (violet). As additional reference, we also report models trained on both $P$ and $Q$ to imitate ideally robust models (teal). For the three distribution-shifts shown, training on the more diverse dataset $P$ is beneficial compared to training on $P_\text{best}$ alone.
  • ...and 17 more figures