Table of Contents
Fetching ...

SGPMIL: Sparse Gaussian Process Multiple Instance Learning

Andreas Lolos, Stergios Christodoulidis, Aris L. Moustakas, Jose Dolz, Maria Vakalopoulou

TL;DR

SGPMIL introduces a probabilistic attention-based MIL framework built on Sparse Gaussian Processes to jointly model bag- and instance-level predictions for gigapixel pathology images. By learning a variational posterior over attention scores and incorporating a feature-scaling mean, along with relaxed attention normalization and diagonal covariance, SGPMIL achieves calibrated uncertainty estimates, improved instance localization, and competitive bag-level accuracy. The method demonstrates strong performance and interpretability across diverse datasets (e.g., CAMELYON16, TCGA-NSCLC, PANDA, BRACS) while addressing numerical stability and scalability shortcomings of prior probabilistic MIL approaches. Overall, SGPMIL provides a principled, scalable approach for uncertainty-aware MIL in digital pathology with practical implications for safer clinical deployment.

Abstract

Multiple Instance Learning (MIL) offers a natural solution for settings where only coarse, bag-level labels are available, without having access to instance-level annotations. This is usually the case in digital pathology, which consists of gigapixel-sized images. While deterministic attention-based MIL approaches achieve strong bag-level performance, they often overlook the uncertainty inherent in instance relevance. In this paper, we address the lack of uncertainty quantification in instance-level attention scores by introducing SGPMIL, a new probabilistic attention-based MIL framework grounded in Sparse Gaussian Processes (SGP). By learning a posterior distribution over attention scores, SGPMIL enables principled uncertainty estimation, resulting in more reliable and calibrated instance relevance maps. Our approach not only preserves competitive bag-level performance but also significantly improves the quality and interpretability of instance-level predictions under uncertainty. SGPMIL extends prior work by introducing feature scaling in the SGP predictive mean function, leading to faster training, improved efficiency, and enhanced instance-level performance. Extensive experiments on multiple well-established digital pathology datasets highlight the effectiveness of our approach across both bag- and instance-level evaluations. Our code is available at https://github.com/mandlos/SGPMIL.

SGPMIL: Sparse Gaussian Process Multiple Instance Learning

TL;DR

SGPMIL introduces a probabilistic attention-based MIL framework built on Sparse Gaussian Processes to jointly model bag- and instance-level predictions for gigapixel pathology images. By learning a variational posterior over attention scores and incorporating a feature-scaling mean, along with relaxed attention normalization and diagonal covariance, SGPMIL achieves calibrated uncertainty estimates, improved instance localization, and competitive bag-level accuracy. The method demonstrates strong performance and interpretability across diverse datasets (e.g., CAMELYON16, TCGA-NSCLC, PANDA, BRACS) while addressing numerical stability and scalability shortcomings of prior probabilistic MIL approaches. Overall, SGPMIL provides a principled, scalable approach for uncertainty-aware MIL in digital pathology with practical implications for safer clinical deployment.

Abstract

Multiple Instance Learning (MIL) offers a natural solution for settings where only coarse, bag-level labels are available, without having access to instance-level annotations. This is usually the case in digital pathology, which consists of gigapixel-sized images. While deterministic attention-based MIL approaches achieve strong bag-level performance, they often overlook the uncertainty inherent in instance relevance. In this paper, we address the lack of uncertainty quantification in instance-level attention scores by introducing SGPMIL, a new probabilistic attention-based MIL framework grounded in Sparse Gaussian Processes (SGP). By learning a posterior distribution over attention scores, SGPMIL enables principled uncertainty estimation, resulting in more reliable and calibrated instance relevance maps. Our approach not only preserves competitive bag-level performance but also significantly improves the quality and interpretability of instance-level predictions under uncertainty. SGPMIL extends prior work by introducing feature scaling in the SGP predictive mean function, leading to faster training, improved efficiency, and enhanced instance-level performance. Extensive experiments on multiple well-established digital pathology datasets highlight the effectiveness of our approach across both bag- and instance-level evaluations. Our code is available at https://github.com/mandlos/SGPMIL.

Paper Structure

This paper contains 20 sections, 7 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: SGPMIL architecture overview. On the left, we illustrate the complete processing pipeline: WSIs are segmented and tiled into patches, which are then encoded using a frozen foundation model. The resulting patch embeddings are passed through an attention-based MIL head incorporating a probabilistic SGP mechanism. The attention-weighted embeddings are summed and projected through a trainable MLP. On the right, we highlight the probabilistic attention component. The SGP layer receives the embeddings along with learnable inducing points and infers a variational posterior over patch-level attention scores. Multiple attention samples are drawn to reweigh the embeddings, which are then aggregated into stochastic slide-level and classified via a linear layer followed by softmax.
  • Figure 2: Slide-level performance across multiple datasets. ${}^\star$ denotes statistical significance ($p < 0.05$) based on one-sided paired t-tests.
  • Figure 3: Inducing-point structure and normalized attention heatmaps on a CAMELYON16 test slide. Top row: (a) slide with ground-truth annotations; (b) inducing-point label map; (c) top-7 most similar patches for the most representative inducing points. Bottom row: normalized attention heatmaps (scores in $[0,1]$). For DGRMIL, we use the cls-token attention; for BayesMIL, AGP, and SGPMIL, we plot the mean attention per patch. Yellow contours denote ground-truth annotations. See Supplementary Figure 3 for additional visualizations.
  • Figure 4: Prediction uncertainty for correctly (green) and incorrectly (red) classified WSIs in the BRACS (left) and PANDA (right) datasets. Each boxplot pair shows the distribution of standard deviation of predicted class probabilities. A statistically significant difference in uncertainty is observed in both datasets ($p < 0.05$, Welch’s $t$-test welch1947generalization_ttest).