Table of Contents
Fetching ...

The Impact of Scanner Domain Shift on Deep Learning Performance in Medical Imaging: an Experimental Study

Brian Guo, Darui Lu, Gregory Szumel, Rongze Gui, Tingyu Wang, Nicholas Konz, Maciej A. Mazurowski

TL;DR

This study systematically quantifies how scanner domain shift degrades deep learning performance in medical imaging across seven public datasets spanning MRI, CT, and X-ray. By training separate models on data from two scanner manufacturers and evaluating on both same- and cross-manufacturer test sets, the authors provide multi-modality, multi-task evidence that cross-domain performance declines in most cases, with MRI being the most sensitive and CT the least. They report average cross-domain AUC losses of about $-0.097$ for MRI, $-0.067$ for X-ray, and $-0.026$ for CT, and observe near-transfer behavior for some CT tasks. Additional analyses show that injecting target-domain data into training can help in some cases, while simple Gaussian noise does not improve cross-domain generalization, highlighting the need for more sophisticated domain-robust strategies. The work underscores the practical importance of accounting for scanner variability in deploying radiology-focused deep learning systems across diverse clinical settings.

Abstract

Purpose: Medical images acquired using different scanners and protocols can differ substantially in their appearance. This phenomenon, scanner domain shift, can result in a drop in the performance of deep neural networks which are trained on data acquired by one scanner and tested on another. This significant practical issue is well-acknowledged, however, no systematic study of the issue is available across different modalities and diagnostic tasks. Materials and Methods: In this paper, we present a broad experimental study evaluating the impact of scanner domain shift on convolutional neural network performance for different automated diagnostic tasks. We evaluate this phenomenon in common radiological modalities, including X-ray, CT, and MRI. Results: We find that network performance on data from a different scanner is almost always worse than on same-scanner data, and we quantify the degree of performance drop across different datasets. Notably, we find that this drop is most severe for MRI, moderate for X-ray, and quite small for CT, on average, which we attribute to the standardized nature of CT acquisition systems which is not present in MRI or X-ray. We also study how injecting varying amounts of target domain data into the training set, as well as adding noise to the training data, helps with generalization. Conclusion: Our results provide extensive experimental evidence and quantification of the extent of performance drop caused by scanner domain shift in deep learning across different modalities, with the goal of guiding the future development of robust deep learning models for medical image analysis.

The Impact of Scanner Domain Shift on Deep Learning Performance in Medical Imaging: an Experimental Study

TL;DR

This study systematically quantifies how scanner domain shift degrades deep learning performance in medical imaging across seven public datasets spanning MRI, CT, and X-ray. By training separate models on data from two scanner manufacturers and evaluating on both same- and cross-manufacturer test sets, the authors provide multi-modality, multi-task evidence that cross-domain performance declines in most cases, with MRI being the most sensitive and CT the least. They report average cross-domain AUC losses of about for MRI, for X-ray, and for CT, and observe near-transfer behavior for some CT tasks. Additional analyses show that injecting target-domain data into training can help in some cases, while simple Gaussian noise does not improve cross-domain generalization, highlighting the need for more sophisticated domain-robust strategies. The work underscores the practical importance of accounting for scanner variability in deploying radiology-focused deep learning systems across diverse clinical settings.

Abstract

Purpose: Medical images acquired using different scanners and protocols can differ substantially in their appearance. This phenomenon, scanner domain shift, can result in a drop in the performance of deep neural networks which are trained on data acquired by one scanner and tested on another. This significant practical issue is well-acknowledged, however, no systematic study of the issue is available across different modalities and diagnostic tasks. Materials and Methods: In this paper, we present a broad experimental study evaluating the impact of scanner domain shift on convolutional neural network performance for different automated diagnostic tasks. We evaluate this phenomenon in common radiological modalities, including X-ray, CT, and MRI. Results: We find that network performance on data from a different scanner is almost always worse than on same-scanner data, and we quantify the degree of performance drop across different datasets. Notably, we find that this drop is most severe for MRI, moderate for X-ray, and quite small for CT, on average, which we attribute to the standardized nature of CT acquisition systems which is not present in MRI or X-ray. We also study how injecting varying amounts of target domain data into the training set, as well as adding noise to the training data, helps with generalization. Conclusion: Our results provide extensive experimental evidence and quantification of the extent of performance drop caused by scanner domain shift in deep learning across different modalities, with the goal of guiding the future development of robust deep learning models for medical image analysis.
Paper Structure (18 sections, 5 figures, 2 tables)

This paper contains 18 sections, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Seeded Training and Testing Procedure: See Section \ref{['sec:expdesign']} for more details.
  • Figure 2: Visualization of the change in network performance due to the test scanner domain shifting away from the training scanner domain (see Table \ref{['tab:shift']} for reference). Each pair of bars is for a single modality and training set scanner manufacturer.
  • Figure 3: Box-and-whisker plot of the change in neural network classification performance due to scanner domain shift across different modalities (see Table \ref{['tab:shift']}). Average values shown with red lines.
  • Figure 4: Scanner domain mixing experiment results for (a) CT-Kidney with Siemens and Other training domains on the left and right, respectively; (b) CT-LIDC with GE and Other training domains on the left and right; and (c) MRI-Prostate with Siemens and Other training domains. Different colors correspond to different scanner domains for the test set. Confidence intervals (standard deviation over all runs) shown with dotted lines.
  • Figure 5: Additive noise experiment results for (a) CT-Kidney with Siemens and Other training domains on the left and right, respectively; (b) CT-LIDC with GE and Other training domains on the left and right; and (c) MRI-Prostate with Siemens and Other training domains. Different colors correspond to different scanner domains for the test set. Confidence intervals (standard deviation over all runs) shown with dotted lines.