Table of Contents
Fetching ...

Synthetic Data Reveals Generalization Gaps in Correlated Multiple Instance Learning

Ethan Harvey, Dennis Johan Loevlie, Michael C. Hughes

TL;DR

This work tackles generalization gaps in MIL for medical imaging by introducing Shifted Mean MIL, a context-aware synthetic task with a context window $R$ and a small discriminative feature set $K$ among $M$ features. It derives the Bayes estimator $p(y_i|h_i)$ in closed form as an oracle baseline and systematically compares conventional MIL, correlated MIL (e.g., TransMIL), and smAP pooling. The results show that standard MIL and recent correlated MIL methods fall short of the Bayes predictor when context matters, with smAP approaching but not matching it, even at $N=10^4$ bags; this highlights a need for data-efficient MIL approaches with stronger inductive biases to exploit local context in medical imaging. The findings motivate developing context-aware, regularized MIL architectures that can perform well with limited labeled data on real-world medical datasets.

Abstract

Multiple instance learning (MIL) is often used in medical imaging to classify high-resolution 2D images by processing patches or classify 3D volumes by processing slices. However, conventional MIL approaches treat instances separately, ignoring contextual relationships such as the appearance of nearby patches or slices that can be essential in real applications. We design a synthetic classification task where accounting for adjacent instance features is crucial for accurate prediction. We demonstrate the limitations of off-the-shelf MIL approaches by quantifying their performance compared to the optimal Bayes estimator for this task, which is available in closed-form. We empirically show that newer correlated MIL methods still do not achieve the best possible performance when trained with ten thousand training samples, each containing many instances.

Synthetic Data Reveals Generalization Gaps in Correlated Multiple Instance Learning

TL;DR

This work tackles generalization gaps in MIL for medical imaging by introducing Shifted Mean MIL, a context-aware synthetic task with a context window and a small discriminative feature set among features. It derives the Bayes estimator in closed form as an oracle baseline and systematically compares conventional MIL, correlated MIL (e.g., TransMIL), and smAP pooling. The results show that standard MIL and recent correlated MIL methods fall short of the Bayes predictor when context matters, with smAP approaching but not matching it, even at bags; this highlights a need for data-efficient MIL approaches with stronger inductive biases to exploit local context in medical imaging. The findings motivate developing context-aware, regularized MIL architectures that can perform well with limited labeled data on real-world medical datasets.

Abstract

Multiple instance learning (MIL) is often used in medical imaging to classify high-resolution 2D images by processing patches or classify 3D volumes by processing slices. However, conventional MIL approaches treat instances separately, ignoring contextual relationships such as the appearance of nearby patches or slices that can be essential in real applications. We design a synthetic classification task where accounting for adjacent instance features is crucial for accurate prediction. We demonstrate the limitations of off-the-shelf MIL approaches by quantifying their performance compared to the optimal Bayes estimator for this task, which is available in closed-form. We empirically show that newer correlated MIL methods still do not achieve the best possible performance when trained with ten thousand training samples, each containing many instances.

Paper Structure

This paper contains 13 sections, 14 equations, 7 figures.

Figures (7)

  • Figure 1: Test AUROC as a function of training set size $N$. All data is drawn from our Shifted Mean MIL data-generating process for binary classification, with $R{=}3, \Delta{=}2$. Conventional MIL approaches (Max, Mean, ABMIL) cannot match the Bayes estimator $p(y_i = 1 \mid h_i)$ as they do not account for dependencies between instances within a bag. Surprisingly, even with $N{=}10000$, correlated MIL approaches (TransMIL shao2021transmil, smAP castro2024sm) do not reach the ceiling set by the Bayes estimator. smAP comes close, but bootstrapping reveals the Bayes estimator maintains a statistically significant advantage (mean of AUROC difference is 0.014, 95% confidence interval of [0.007, 0.022] does not include zero or any negative values). Takeaway: Our work reveals a need for data-efficient MIL that better accounts for context between instances.
  • Figure 2: Example data-generating distributions for a discriminative feature for negative ($y_i{=}0$) and positive ($y_i{=}1)$ "bags" of $S_i$ instances drawn from our Shifted Mean MIL synthetic data. Setting $R{=}3$ means context around modified instances (in red) can help. In our experiments, we set $R{=}3$, $S_{\text{low}}{=}15$, $S_{\text{high}}{=}45$, $K{=}1$, $M{=}768$, $\mu{=}0$, and $\sigma{=}1$. We study how prediction quality changes as we vary training set size $N$ (Fig. \ref{['fig:varying_training_set_size']}) and class separation $\Delta$ (Fig. \ref{['fig:varying_separation']}).
  • Figure A.1: Test AUROC as a function of separation $\Delta$. All data drawn from our Shifted Mean MIL data-generating process fro binary classification, with $R{=}3$ and $N{=}400$.
  • Figure B.1: Handcrafted parameters (prediction-aggregation approach).
  • Figure B.2: Handcrafted parameters (embedding-aggregation approach).
  • ...and 2 more figures