Out-of-Distribution Detection and Data Drift Monitoring using Statistical Process Control
Ghada Zamzmi, Kesavan Venkatesh, Brandon Nelson, Smriti Prathapan, Paul H. Yi, Berkman Sahiner, Jana G. Delfino
TL;DR
This work addresses the critical issue of data drift and out-of-distribution inputs in clinical imaging AI by introducing a framework that fuses ML-based OOD detection with Statistical Process Control (SPC) charts. By evaluating feature representations (autoencoder, BCE, and contrastive learning) and distance metrics (Mahalanobis distance and cosine similarity), the authors demonstrate that SPC-based monitoring can both flag individual OOD images and track drift over time, in a modality-agnostic manner. The approach is validated on CT and CXR tasks, showing high detection performance (e.g., CT: up to 0.913 accuracy with contrastive features; CXR: 0.995 accuracy with supervised VGG16) and rapid drift identification via CUSUM, with 3σ effective for per-image detection. These results suggest SPC-based OOD monitoring can be tailored to specific clinical contexts and used to trigger recalibration or retraining, thereby enhancing safety and reliability of medical imaging AI systems.
Abstract
Background: Machine learning (ML) methods often fail with data that deviates from their training distribution. This is a significant concern for ML-enabled devices in clinical settings, where data drift may cause unexpected performance that jeopardizes patient safety. Method: We propose a ML-enabled Statistical Process Control (SPC) framework for out-of-distribution (OOD) detection and drift monitoring. SPC is advantageous as it visually and statistically highlights deviations from the expected distribution. To demonstrate the utility of the proposed framework for monitoring data drift in radiological images, we investigated different design choices, including methods for extracting feature representations, drift quantification, and SPC parameter selection. Results: We demonstrate the effectiveness of our framework for two tasks: 1) differentiating axial vs. non-axial computed tomography (CT) images and 2) separating chest x-ray (CXR) from other modalities. For both tasks, we achieved high accuracy in detecting OOD inputs, with 0.913 in CT and 0.995 in CXR, and sensitivity of 0.980 in CT and 0.984 in CXR. Our framework was also adept at monitoring data streams and identifying the time a drift occurred. In a simulation with 100 daily CXR cases, we detected a drift in OOD input percentage from 0-1% to 3-5% within two days, maintaining a low false-positive rate. Through additional experimental results, we demonstrate the framework's data-agnostic nature and independence from the underlying model's structure. Conclusion: We propose a framework for OOD detection and drift monitoring that is agnostic to data, modality, and model. The framework is customizable and can be adapted for specific applications.
