Table of Contents
Fetching ...

Label-free Concept Based Multiple Instance Learning for Gigapixel Histopathology

Susu Sun, Leslie Tessier, Frédérique Meeuwsen, Clément Grisi, Dominique van Midden, Geert Litjens, Christian F. Baumgartner

TL;DR

Concept MIL delivers inherently interpretable whole-slide image classification by linking image features to human-understandable pathology concepts via a vision-language foundation model. The architecture combines an image MIL branch that selects top patches with a concept MIL branch that makes predictions as a linear combination of concept activations from those patches, enabling faithful local explanations and dataset-wide global explanations. It eliminates the need for manual concept labeling by leveraging CONCH for label-free concept projection and validates on Camelyon16 and PANDA with competitive accuracy and AUC, while demonstrating alignment between model concepts and pathologist knowledge. Quantitative and qualitative evaluations, including a pathologist user study, indicate that the explanations are meaningful and useful for clinical trust, with strong local localization and coherent global concept distributions that reflect tumor vs normal differences. The work highlights the practical potential of concept-based, inherently interpretable MIL for safe and transparent deployment in computational pathology, and outlines directions for multi-scale concepts and multi-class extensions."

Abstract

Multiple Instance Learning (MIL) methods allow for gigapixel Whole-Slide Image (WSI) analysis with only slide-level annotations. Interpretability is crucial for safely deploying such algorithms in high-stakes medical domains. Traditional MIL methods offer explanations by highlighting salient regions. However, such spatial heatmaps provide limited insights for end users. To address this, we propose a novel inherently interpretable WSI-classification approach that uses human-understandable pathology concepts to generate explanations. Our proposed Concept MIL model leverages recent advances in vision-language models to directly predict pathology concepts based on image features. The model's predictions are obtained through a linear combination of the concepts identified on the top-K patches of a WSI, enabling inherent explanations by tracing each concept's influence on the prediction. In contrast to traditional concept-based interpretable models, our approach eliminates the need for costly human annotations by leveraging the vision-language model. We validate our method on two widely used pathology datasets: Camelyon16 and PANDA. On both datasets, Concept MIL achieves AUC and accuracy scores over 0.9, putting it on par with state-of-the-art models. We further find that 87.1\% (Camelyon16) and 85.3\% (PANDA) of the top 20 patches fall within the tumor region. A user study shows that the concepts identified by our model align with the concepts used by pathologists, making it a promising strategy for human-interpretable WSI classification.

Label-free Concept Based Multiple Instance Learning for Gigapixel Histopathology

TL;DR

Concept MIL delivers inherently interpretable whole-slide image classification by linking image features to human-understandable pathology concepts via a vision-language foundation model. The architecture combines an image MIL branch that selects top patches with a concept MIL branch that makes predictions as a linear combination of concept activations from those patches, enabling faithful local explanations and dataset-wide global explanations. It eliminates the need for manual concept labeling by leveraging CONCH for label-free concept projection and validates on Camelyon16 and PANDA with competitive accuracy and AUC, while demonstrating alignment between model concepts and pathologist knowledge. Quantitative and qualitative evaluations, including a pathologist user study, indicate that the explanations are meaningful and useful for clinical trust, with strong local localization and coherent global concept distributions that reflect tumor vs normal differences. The work highlights the practical potential of concept-based, inherently interpretable MIL for safe and transparent deployment in computational pathology, and outlines directions for multi-scale concepts and multi-class extensions."

Abstract

Multiple Instance Learning (MIL) methods allow for gigapixel Whole-Slide Image (WSI) analysis with only slide-level annotations. Interpretability is crucial for safely deploying such algorithms in high-stakes medical domains. Traditional MIL methods offer explanations by highlighting salient regions. However, such spatial heatmaps provide limited insights for end users. To address this, we propose a novel inherently interpretable WSI-classification approach that uses human-understandable pathology concepts to generate explanations. Our proposed Concept MIL model leverages recent advances in vision-language models to directly predict pathology concepts based on image features. The model's predictions are obtained through a linear combination of the concepts identified on the top-K patches of a WSI, enabling inherent explanations by tracing each concept's influence on the prediction. In contrast to traditional concept-based interpretable models, our approach eliminates the need for costly human annotations by leveraging the vision-language model. We validate our method on two widely used pathology datasets: Camelyon16 and PANDA. On both datasets, Concept MIL achieves AUC and accuracy scores over 0.9, putting it on par with state-of-the-art models. We further find that 87.1\% (Camelyon16) and 85.3\% (PANDA) of the top 20 patches fall within the tumor region. A user study shows that the concepts identified by our model align with the concepts used by pathologists, making it a promising strategy for human-interpretable WSI classification.
Paper Structure (24 sections, 9 equations, 5 figures, 5 tables)

This paper contains 24 sections, 9 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Training and obtaining prediction and explanation with Concept MIL. During training, we extract WSI image features as shown in (a) and project these image features to the predefined concept space to generate concept features as shown in (b), then jointly train the image MIL branch and concept MIL branch through a patch selection module as shown in (c). During inference, the concept MIL branch generates the final prediction using the concept features of top K patches selected by the image MIL branch. To explain the prediction for an individual sample, we provide the attention map from the image MIL branch with the top K patches highlighted by green dots, along with the corresponding top K patches and the concept contributions to the WSI prediction, as illustrated in (d).
  • Figure 2: Patches containing fat cells get high cosine similarity scores with the concept of "fibrous tissue", indicating a potential misalignment between the image and text spaces in CONCHlu2024avisionlanguage.
  • Figure 3: Local explanations for predictions of tumor and normal cases from Camelyon16 dataset. A local explanation for a WSI prediction includes four components: (a) an attention map, (b) the top 20 patches selected based on attention scores, (c) concept features for each selected patch, and (d) the whole slide level concept contribution vector.
  • Figure 4: Global explanations for the model trained on PANDA dataset. (a) and (b) are WSI-level mean concept contribution vectors for tumor and normal predictions. (c) and (d) show t-SNE plots of concept vectors from normal and tumor cases at the patch level and WSI level. (e) shows distributions of three individual concepts across normal and tumor cases, along with their corresponding Jensen–Shannon divergence scores.
  • Figure 5: (a) Attention map from Concept MIL model on a Camelyon16 sample, with the top 20 patches highlighted by green dots. (b) Ground truth mask for the sample, showing tumor regions in yellow and normal tissue in dark green.