Table of Contents
Fetching ...

cDP-MIL: Robust Multiple Instance Learning via Cascaded Dirichlet Process

Yihang Chen, Tsai Hor Chan, Guosheng Yin, Yuming Jiang, Lequan Yu

TL;DR

The paper addresses weaknesses in MIL for whole-slide histopathology by introducing cDP-MIL, a cascaded Bayesian nonparametric framework that performs instance-level clustering with a neural-network parameterized Dirichlet process and bag-level prediction with a second Dirichlet process. This design enables richer distributional pooling that incorporates covariances, plus natural regularization and principled uncertainty via posterior predictive distributions. Through variational inference, the model learns expressive cluster moments and provides patch- and slide-level scores for tumor localization and uncertainty estimation, achieving state-of-the-art results on five WSIs benchmarks and robust OOD detection. The approach offers practical impact for robust cancer diagnosis, localization, and generalization across datasets, with code available for reproducibility.

Abstract

Multiple instance learning (MIL) has been extensively applied to whole slide histopathology image (WSI) analysis. The existing aggregation strategy in MIL, which primarily relies on the first-order distance (e.g., mean difference) between instances, fails to accurately approximate the true feature distribution of each instance, leading to biased slide-level representations. Moreover, the scarcity of WSI observations easily leads to model overfitting, resulting in unstable testing performance and limited generalizability. To tackle these challenges, we propose a new Bayesian nonparametric framework for multiple instance learning, which adopts a cascade of Dirichlet processes (cDP) to incorporate the instance-to-bag characteristic of the WSIs. We perform feature aggregation based on the latent clusters formed by the Dirichlet process, which incorporates the covariances of the patch features and forms more representative clusters. We then perform bag-level prediction with another Dirichlet process model on the bags, which imposes a natural regularization on learning to prevent overfitting and enhance generalizability. Moreover, as a Bayesian nonparametric method, the cDP model can accurately generate posterior uncertainty, which allows for the detection of outlier samples and tumor localization. Extensive experiments on five WSI benchmarks validate the superior performance of our method, as well as its generalizability and ability to estimate uncertainties. Codes are available at https://github.com/HKU-MedAI/cDPMIL.

cDP-MIL: Robust Multiple Instance Learning via Cascaded Dirichlet Process

TL;DR

The paper addresses weaknesses in MIL for whole-slide histopathology by introducing cDP-MIL, a cascaded Bayesian nonparametric framework that performs instance-level clustering with a neural-network parameterized Dirichlet process and bag-level prediction with a second Dirichlet process. This design enables richer distributional pooling that incorporates covariances, plus natural regularization and principled uncertainty via posterior predictive distributions. Through variational inference, the model learns expressive cluster moments and provides patch- and slide-level scores for tumor localization and uncertainty estimation, achieving state-of-the-art results on five WSIs benchmarks and robust OOD detection. The approach offers practical impact for robust cancer diagnosis, localization, and generalization across datasets, with code available for reproducibility.

Abstract

Multiple instance learning (MIL) has been extensively applied to whole slide histopathology image (WSI) analysis. The existing aggregation strategy in MIL, which primarily relies on the first-order distance (e.g., mean difference) between instances, fails to accurately approximate the true feature distribution of each instance, leading to biased slide-level representations. Moreover, the scarcity of WSI observations easily leads to model overfitting, resulting in unstable testing performance and limited generalizability. To tackle these challenges, we propose a new Bayesian nonparametric framework for multiple instance learning, which adopts a cascade of Dirichlet processes (cDP) to incorporate the instance-to-bag characteristic of the WSIs. We perform feature aggregation based on the latent clusters formed by the Dirichlet process, which incorporates the covariances of the patch features and forms more representative clusters. We then perform bag-level prediction with another Dirichlet process model on the bags, which imposes a natural regularization on learning to prevent overfitting and enhance generalizability. Moreover, as a Bayesian nonparametric method, the cDP model can accurately generate posterior uncertainty, which allows for the detection of outlier samples and tumor localization. Extensive experiments on five WSI benchmarks validate the superior performance of our method, as well as its generalizability and ability to estimate uncertainties. Codes are available at https://github.com/HKU-MedAI/cDPMIL.
Paper Structure (14 sections, 9 equations, 6 figures, 7 tables, 2 algorithms)

This paper contains 14 sections, 9 equations, 6 figures, 7 tables, 2 algorithms.

Figures (6)

  • Figure 1: Illustration of a cascaded Dirichlet process model on whole slide images. We first pool the WSI patches with latent Gaussian distributions, and then categorize the pooled WSIs into $K$ classes.
  • Figure 2: The overall framework of our proposed cascaded DP model. We first use Otsu's segmentation to outline tissue regions and select foreground instances for MIL learning. We then design an aggregation module based on the Dirichlet process to cluster instance-level features into latent clusters. Finally, a prediction module based on DP would perform predictions based on the distribution learned for each cluster.
  • Figure 3: The probabilistic model of the cascaded DP in plate notation on the $b$-th bag.
  • Figure 4: Tumor region localization. Left: Ground truth; middle: HEAT; right: cDP-MIL. Ground truth regions are outlined with red boundaries. Lighter yellow indicates more important regions.
  • Figure 5: Performance of cDP-MIL with different concentration parameters $\eta_1$ and $\eta_2$. Left: COAD; middle: BRCA; right: Camelyon 16.
  • ...and 1 more figures