PCA for Enhanced Cross-Dataset Generalizability in Breast Ultrasound Tumor Segmentation

Christian Schmidt; Heinrich Martin Overhoff

PCA for Enhanced Cross-Dataset Generalizability in Breast Ultrasound Tumor Segmentation

Christian Schmidt, Heinrich Martin Overhoff

TL;DR

This work tackles cross-dataset generalization in breast ultrasound tumor segmentation, a task hampered by external validity issues. It proposes PCA reconstruction to reduce dataset-specific noise while preserving about $90\%$ of variance, and evaluates across six public datasets with twelve training configurations using a U-Net. Results show PCA improves recall and Dice in the most challenging out-of-domain pairs and reduces the external-validation recall decline by roughly $33\%$, with statistically significant gains in the worst-case scenarios. The findings suggest PCA preprocessing as a lightweight safeguard to enhance external validity in medical image segmentation and motivate combining PCA with other generalization strategies in real-world clinical deployments.

Abstract

In medical image segmentation, limited external validity remains a critical obstacle when models are deployed across unseen datasets, an issue particularly pronounced in the ultrasound image domain. Existing solutions-such as domain adaptation and GAN-based style transfer-while promising, often fall short in the medical domain where datasets are typically small and diverse. This paper presents a novel application of principal component analysis (PCA) to address this limitation. PCA preprocessing reduces noise and emphasizes essential features by retaining approximately 90\% of the dataset variance. We evaluate our approach across six diverse breast tumor ultrasound datasets comprising 3,983 B-mode images and corresponding expert tumor segmentation masks. For each dataset, a corresponding dimensionality reduced PCA-dataset is created and U-Net-based segmentation models are trained on each of the twelve datasets. Each model trained on an original dataset was inferenced on the remaining five out-of-domain original datasets (baseline results), while each model trained on a PCA dataset was inferenced on five out-of-domain PCA datasets. Our experimental results indicate that using PCA reconstructed datasets, instead of original images, improves the model's recall and Dice scores, particularly for model-dataset pairs where baseline performance was lowest, achieving statistically significant gains in recall (0.57 $\pm$ 0.07 vs. 0.70 $\pm$ 0.05, $p = 0.0004$) and Dice scores (0.50 $\pm$ 0.06 vs. 0.58 $\pm$ 0.06, $p = 0.03$). Our method reduced the decline in recall values due to external validation by $33\%$. These findings underscore the potential of PCA reconstruction as a safeguard to mitigate declines in segmentation performance, especially in challenging cases, with implications for enhancing external validity in real-world medical applications.

PCA for Enhanced Cross-Dataset Generalizability in Breast Ultrasound Tumor Segmentation

TL;DR

Abstract

PCA for Enhanced Cross-Dataset Generalizability in Breast Ultrasound Tumor Segmentation

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (4)