Table of Contents
Fetching ...

Scalable Drift Monitoring in Medical Imaging AI

Jameson Merkow, Felix J. Dorfner, Xiyu Yang, Alexander Ersoy, Giridhar Dasegowda, Mannudeep Kalra, Matthew P. Lungren, Christopher P. Bridge, Ivan Tarapov

TL;DR

MMC+ is developed, an enhanced framework for scalable drift monitoring, building upon the CheXstray framework that introduced real-time drift detection for medical imaging AI models using multi-modal data concordance, providing a more scalable and adaptable solution for real-world healthcare settings and offers a reliable and cost-effective alternative to continuous performance monitoring.

Abstract

The integration of artificial intelligence (AI) into medical imaging has advanced clinical diagnostics but poses challenges in managing model drift and ensuring long-term reliability. To address these challenges, we develop MMC+, an enhanced framework for scalable drift monitoring, building upon the CheXstray framework that introduced real-time drift detection for medical imaging AI models using multi-modal data concordance. This work extends the original framework's methodologies, providing a more scalable and adaptable solution for real-world healthcare settings and offers a reliable and cost-effective alternative to continuous performance monitoring addressing limitations of both continuous and periodic monitoring methods. MMC+ introduces critical improvements to the original framework, including more robust handling of diverse data streams, improved scalability with the integration of foundation models like MedImageInsight for high-dimensional image embeddings without site-specific training, and the introduction of uncertainty bounds to better capture drift in dynamic clinical environments. Validated with real-world data from Massachusetts General Hospital during the COVID-19 pandemic, MMC+ effectively detects significant data shifts and correlates them with model performance changes. While not directly predicting performance degradation, MMC+ serves as an early warning system, indicating when AI systems may deviate from acceptable performance bounds and enabling timely interventions. By emphasizing the importance of monitoring diverse data streams and evaluating data shifts alongside model performance, this work contributes to the broader adoption and integration of AI solutions in clinical settings.

Scalable Drift Monitoring in Medical Imaging AI

TL;DR

MMC+ is developed, an enhanced framework for scalable drift monitoring, building upon the CheXstray framework that introduced real-time drift detection for medical imaging AI models using multi-modal data concordance, providing a more scalable and adaptable solution for real-world healthcare settings and offers a reliable and cost-effective alternative to continuous performance monitoring.

Abstract

The integration of artificial intelligence (AI) into medical imaging has advanced clinical diagnostics but poses challenges in managing model drift and ensuring long-term reliability. To address these challenges, we develop MMC+, an enhanced framework for scalable drift monitoring, building upon the CheXstray framework that introduced real-time drift detection for medical imaging AI models using multi-modal data concordance. This work extends the original framework's methodologies, providing a more scalable and adaptable solution for real-world healthcare settings and offers a reliable and cost-effective alternative to continuous performance monitoring addressing limitations of both continuous and periodic monitoring methods. MMC+ introduces critical improvements to the original framework, including more robust handling of diverse data streams, improved scalability with the integration of foundation models like MedImageInsight for high-dimensional image embeddings without site-specific training, and the introduction of uncertainty bounds to better capture drift in dynamic clinical environments. Validated with real-world data from Massachusetts General Hospital during the COVID-19 pandemic, MMC+ effectively detects significant data shifts and correlates them with model performance changes. While not directly predicting performance degradation, MMC+ serves as an early warning system, indicating when AI systems may deviate from acceptable performance bounds and enabling timely interventions. By emphasizing the importance of monitoring diverse data streams and evaluating data shifts alongside model performance, this work contributes to the broader adoption and integration of AI solutions in clinical settings.

Paper Structure

This paper contains 21 sections, 9 equations, 9 figures, 3 tables.

Figures (9)

  • Figure 1: Illustration of the CheXStray framework for calculating the MMC+ metric. MMC+ is calculated by comparing a reference set with a detection window from ongoing data streams to assess distribution similarity. High MMC+ values indicate potential deviations from normal performance thresholds. The upper panel shows the steps for calculating MMC+ using components in three high-level categories: Appearance, Model Predictions, and Metadata. Each individual component’s distribution is evaluated against the reference set to calculate a similarity metric. These metrics are then normalized and weighted to be aggregated into the MMC+ metric. The lower panel presents an example of MMC+ over time, displaying its relationship to known performance values and acceptable performance bands.
  • Figure 2: Plot of the evolution of MMC+ over time from 2019-11 to 2021-07. The red line depicts the weighted MMC+ value and the gray shaded area indicates the range or variability around the MMC+, providing the uncertainty in the MMC+ measurement during this time period. Two dashed vertical lines are shown: the first marks the start of the test data, and the second represents March 10, 2021, the day Massachusetts declared a state of emergency. Two arrows highlight the first windows where all data is sourced from either the test data or post-state-of-emergency period, a consequence of the window duration.
  • Figure 3: Plot of the evolution of MMC+ sub-components over time from 2019-11 to 2021-07. The MMC+ metric is composed of three high level components that capture data drift: a) the appearance metric represents changes in the appearance of medical images as encoded by the MedImageInsight foundation model, b) the model prediction metric captures shifts in model predictions from a trained classifier providing changes in the model's output over time, and c) the metadata metric measures variations in the extracted from the DICOM files and RIS systems.
  • Figure 4: Plot of model performance (AUROC) for the nine findings and the macro/micro averages. AUROC is marked in blue, with the average performance during the reference period shown as a solid grey line and the reference period's 3 standard deviation range as a grey shaded area for each finding.
  • Figure 5: Relationship between MMC+ and normalized AUROC (standard deviation of AUROC) for each finding. In each plot, the blue points represent test set data before March 10, 2020, while the orange points represent data after this date. The horizontal dotted lines mark $\pm3$ standard deviations of the AUROC, and the vertical line at $\mathrm{MMC}+ = 10$ represents the threshold where drift is observed. Above each scatter plot, KDE plots for the MMC+ and normalized AUROC are displayed, showing the distribution of test data pre- and post-March 10 (blue for pre, orange for post). These plots highlight the separation between the two time periods based on both MMC+ and AUROC behavior.
  • ...and 4 more figures