Table of Contents
Fetching ...

Standardizing Medical Images at Scale for AI

Callen MacPhee, Yiming Zhou, Koichiro Kishima, Bahram Jalali

Abstract

Deep learning has achieved remarkable success in medical image analysis, yet its performance remains highly sensitive to the heterogeneity of clinical data. Differences in imaging hardware, staining protocols, and acquisition conditions produce substantial domain shifts that degrade model generalization across institutions. Here we present a physics-based data preprocessing framework based on the PhyCV (Physics-Inspired Computer Vision) family of algorithms, which standardizes medical images through deterministic transformations derived from optical physics. The framework models images as spatially varying optical fields that undergo a virtual diffractive propagation followed by coherent phase detection. This process suppresses non-semantic variability such as color and illumination differences while preserving diagnostically relevant texture and structural features. When applied to histopathological images from the Camelyon17-WILDS benchmark, PhyCV preprocessing improves out-of-distribution breast-cancer classification accuracy from 70.8% (Empirical Risk Minimization baseline) to 90.9%, matching or exceeding data-augmentation and domain-generalization approaches at negligible computational cost. Because the transform is physically interpretable, parameterizable, and differentiable, it can be deployed as a fixed preprocessing stage or integrated into end-to-end learning. These results establish PhyCV as a generalizable data refinery for medical imaging-one that harmonizes heterogeneous datasets through first-principles physics, improving robustness, interpretability, and reproducibility in clinical AI systems.

Standardizing Medical Images at Scale for AI

Abstract

Deep learning has achieved remarkable success in medical image analysis, yet its performance remains highly sensitive to the heterogeneity of clinical data. Differences in imaging hardware, staining protocols, and acquisition conditions produce substantial domain shifts that degrade model generalization across institutions. Here we present a physics-based data preprocessing framework based on the PhyCV (Physics-Inspired Computer Vision) family of algorithms, which standardizes medical images through deterministic transformations derived from optical physics. The framework models images as spatially varying optical fields that undergo a virtual diffractive propagation followed by coherent phase detection. This process suppresses non-semantic variability such as color and illumination differences while preserving diagnostically relevant texture and structural features. When applied to histopathological images from the Camelyon17-WILDS benchmark, PhyCV preprocessing improves out-of-distribution breast-cancer classification accuracy from 70.8% (Empirical Risk Minimization baseline) to 90.9%, matching or exceeding data-augmentation and domain-generalization approaches at negligible computational cost. Because the transform is physically interpretable, parameterizable, and differentiable, it can be deployed as a fixed preprocessing stage or integrated into end-to-end learning. These results establish PhyCV as a generalizable data refinery for medical imaging-one that harmonizes heterogeneous datasets through first-principles physics, improving robustness, interpretability, and reproducibility in clinical AI systems.
Paper Structure (6 sections, 9 equations, 4 figures, 1 table)

This paper contains 6 sections, 9 equations, 4 figures, 1 table.

Figures (4)

  • Figure 1: Examples of physics-inspired computer vision algorithms in PhyCV. The PhyCV framework includes modules for diverse imaging tasks: super-resolution (PhSAR) enhances structural detail in low-resolution MRI scans; edge and orientation extraction (PAGE) reveals vascular and textural organization in retinal imagery; and exposure enhancement (VEViD) restores visibility in low-contrast microscopy images. Together, these algorithms demonstrate the versatility of physics-based models for image refinement and feature extraction across modalities.
  • Figure 2: Conceptual overview of PhyCV for data refinement and standardization. During training (top), heterogeneous data from multiple hospitals are refined into standardized feature representations before being used to train a neural network. During inference (bottom), unseen data undergo the same refinement to ensure consistent predictions.
  • Figure 3: PhyCV improves robustness under non-uniform illumination. Top: Non-uniform illumination is studied by considering 6 levels of illumination (linear) on a single induced pluripotent stem cell (iPS) test image picture from okamoto2011induction. For each of the 6 illumination levels, center region is used for texture analysis. Bottom: corresponding image entropy showing that PhyCV maintains information content even under severe illumination degradation. When information is fully lost in the case of image 6, there is a noteacble loss in the PhyCV output, demonstrating that it can act as a delineator for defective or untrustworthy images.
  • Figure 4: Cross-institutional variability and PhyCV-based refinement. Example histopathology patches from five hospitals showing differences in staining and contrast (top). After PhyCV refinement (bottom), features are standardized, enhancing consistency across sites while preserving tissue structure.