Table of Contents
Fetching ...

Mitigating Spurious Correlations in Patch-wise Tumor Classification on High-Resolution Multimodal Images

Ihab Asaad, Maha Shadaydeh, Joachim Denzler

TL;DR

This work investigates spurious correlations in patch-wise binary tumor classification on high-resolution multimodal images. It identifies tissue-size as a spurious cue that correlates with patch labels and discretizes this attribute into a binary spurious feature. The authors apply GERNE, a gradient extrapolation debiasing method, to maximize worst-group accuracy and demonstrate about a 7% WGA improvement over ERM across two tissue-size thresholds, enhancing performance on minority cases such as small tissue tumor patches. The findings highlight the importance of spurious-correlation aware learning in patch-based analysis and suggest that debiasing strategies can substantially improve robustness in practical diagnostic tasks. The approach has potential applicability to other high-resolution domains where patch-wise decisions are common, such as remote sensing and materials inspection.

Abstract

Patch-wise multi-label classification provides an efficient alternative to full pixel-wise segmentation on high-resolution images, particularly when the objective is to determine the presence or absence of target objects within a patch rather than their precise spatial extent. This formulation substantially reduces annotation cost, simplifies training, and allows flexible patch sizing aligned with the desired level of decision granularity. In this work, we focus on a special case, patch-wise binary classification, applied to the detection of a single class of interest (tumor) on high-resolution multimodal nonlinear microscopy images. We show that, although this simplified formulation enables efficient model development, it can introduce spurious correlations between patch composition and labels: tumor patches tend to contain larger tissue regions, whereas non-tumor patches often consist mostly of background with small tissue areas. We further quantify the bias in model predictions caused by this spurious correlation, and propose to use a debiasing strategy to mitigate its effect. Specifically, we apply GERNE, a debiasing method that can be adapted to maximize worst-group accuracy (WGA). Our results show an improvement in WGA by approximately 7% compared to ERM for two different thresholds used to binarize the spurious feature. This enhancement boosts model performance on critical minority cases, such as tumor patches with small tissues and non-tumor patches with large tissues, and underscores the importance of spurious correlation-aware learning in patch-wise classification problems.

Mitigating Spurious Correlations in Patch-wise Tumor Classification on High-Resolution Multimodal Images

TL;DR

This work investigates spurious correlations in patch-wise binary tumor classification on high-resolution multimodal images. It identifies tissue-size as a spurious cue that correlates with patch labels and discretizes this attribute into a binary spurious feature. The authors apply GERNE, a gradient extrapolation debiasing method, to maximize worst-group accuracy and demonstrate about a 7% WGA improvement over ERM across two tissue-size thresholds, enhancing performance on minority cases such as small tissue tumor patches. The findings highlight the importance of spurious-correlation aware learning in patch-based analysis and suggest that debiasing strategies can substantially improve robustness in practical diagnostic tasks. The approach has potential applicability to other high-resolution domains where patch-wise decisions are common, such as remote sensing and materials inspection.

Abstract

Patch-wise multi-label classification provides an efficient alternative to full pixel-wise segmentation on high-resolution images, particularly when the objective is to determine the presence or absence of target objects within a patch rather than their precise spatial extent. This formulation substantially reduces annotation cost, simplifies training, and allows flexible patch sizing aligned with the desired level of decision granularity. In this work, we focus on a special case, patch-wise binary classification, applied to the detection of a single class of interest (tumor) on high-resolution multimodal nonlinear microscopy images. We show that, although this simplified formulation enables efficient model development, it can introduce spurious correlations between patch composition and labels: tumor patches tend to contain larger tissue regions, whereas non-tumor patches often consist mostly of background with small tissue areas. We further quantify the bias in model predictions caused by this spurious correlation, and propose to use a debiasing strategy to mitigate its effect. Specifically, we apply GERNE, a debiasing method that can be adapted to maximize worst-group accuracy (WGA). Our results show an improvement in WGA by approximately 7% compared to ERM for two different thresholds used to binarize the spurious feature. This enhancement boosts model performance on critical minority cases, such as tumor patches with small tissues and non-tumor patches with large tissues, and underscores the importance of spurious correlation-aware learning in patch-wise classification problems.

Paper Structure

This paper contains 19 sections, 5 equations, 1 figure, 1 table.

Figures (1)

  • Figure 1: Illustration of the patch-wise classification pipeline and the emergence of spurious correlations between patch composition and binary labels. Top row: A high-resolution multimodal nonlinear image (left) sourced from dataset is divided into fixed-size patches to fit the neural network input. The corresponding pixel-wise segmentation mask is shown in the center (red: tumor tissue; black: non-tumor regions, including healthy tissue and background). The binarized patch-wise labels are shown on the right, where each patch is assigned a positive label (red) if it contains at least one tumor pixel, and a negative label (black) otherwise. Bottom row: Histogram-based analysis (on the test set of the dataset under study dataset) of patch composition reveals strong correlations between tissue size and patch labels. The estimated conditional distributions of $p(r_{P_k}^{\mathrm{tumor}} \mid y_k = 1)$, $p(r_{P_k}^{\mathrm{tumor\text{-}tissue}} \mid y_k = 1)$, and $p(r_{P_k}^{\mathrm{tissue}} \mid y_k = 0)$ are shown from left to right as normalized histograms. Tumor patches tend to contain larger tissue regions, whereas non-tumor patches are dominated by background with minimal tissue. These correlations may introduce shortcuts for ERM-trained models, causing models to rely on non-causal features such as overall tissue size. To verify whether these spurious correlations are exploited by the model, we trained a standard ERM-based binary classifier and overlaid, for each distribution, the proportion of correct (green) and incorrect (red) predictions. The resulting pattern confirms that models can rely on non-causal cues such as overall tissue size, underscoring the need for debiasing strategies in patch-based learning frameworks.