xMIL: Insightful Explanations for Multiple Instance Learning in Histopathology

Julius Hense; Mina Jamshidi Idaji; Oliver Eberle; Thomas Schnake; Jonas Dippel; Laure Ciernik; Oliver Buchstab; Andreas Mock; Frederick Klauschen; Klaus-Robert Müller

xMIL: Insightful Explanations for Multiple Instance Learning in Histopathology

Julius Hense, Mina Jamshidi Idaji, Oliver Eberle, Thomas Schnake, Jonas Dippel, Laure Ciernik, Oliver Buchstab, Andreas Mock, Frederick Klauschen, Klaus-Robert Müller

TL;DR

This work addresses the gap in explainability for multiple instance learning in histopathology by introducing xMIL, a general framework that separates aggregation from per-instance evidence via an evidence function. It then adapts Layer-wise Relevance Propagation to MIL (xMIL-LRP), incorporating attention-specific (AH-rule) and normalization-specific (LN-rule) propagation to produce context-aware, positive/negative instance explanations with instance-level conservation. Across toy MNIST-based tasks and four real-world histopathology datasets, xMIL-LRP demonstrates superior faithfulness (lower AUPC) and more informative heatmaps than baselines, especially in biomarker prediction tasks using Transformer architectures. The approach enables pathologists to extract fine-grained insights and supports model debugging and knowledge discovery, with potential applicability to other domains involving complex MIL models and multi-modal inputs.

Abstract

Multiple instance learning (MIL) is an effective and widely used approach for weakly supervised machine learning. In histopathology, MIL models have achieved remarkable success in tasks like tumor detection, biomarker prediction, and outcome prognostication. However, MIL explanation methods are still lagging behind, as they are limited to small bag sizes or disregard instance interactions. We revisit MIL through the lens of explainable AI (XAI) and introduce xMIL, a refined framework with more general assumptions. We demonstrate how to obtain improved MIL explanations using layer-wise relevance propagation (LRP) and conduct extensive evaluation experiments on three toy settings and four real-world histopathology datasets. Our approach consistently outperforms previous explanation attempts with particularly improved faithfulness scores on challenging biomarker prediction tasks. Finally, we showcase how xMIL explanations enable pathologists to extract insights from MIL models, representing a significant advance for knowledge discovery and model debugging in digital histopathology. Codes are available at: https://github.com/bifold-pathomics/xMIL.

xMIL: Insightful Explanations for Multiple Instance Learning in Histopathology

TL;DR

Abstract

Paper Structure (21 sections, 12 equations, 8 figures, 2 tables)

This paper contains 21 sections, 12 equations, 8 figures, 2 tables.

Introduction
Background
Multiple instance learning (MIL)
Limitations of MIL in histopathology
Methods
xMIL: An XAI-based framework for multiple instance learning
xMIL-LRP: Estimating the evidence function
Properties of xMIL-LRP and other explanation methods
Experiments and results
Toy experiments
Histopathology experiments
Extracting insights from xMIL-LRP heatmaps
Conclusion
Appendix
Baseline MIL explanation methods
...and 6 more sections

Figures (8)

Figure 1: In digital pathology, heatmaps guide the identification of tissue slide areas most important for a model prediction. The figure displays heatmaps from different MIL explanation methods (columns) for a head and neck tumor slide (top row) with a selected zoomed-in region (bottom row). The MIL model has been trained to predict HPV status. The xMIL-LRP heatmap shows that the model identified evidence in favor of an HPV infection at the tumor border (red area) and evidence against an HPV infection inside the tumor (blue area, lower half of the tissue). The dominant blue region explains why the model mispredicted the slide as HPV-negative. Investigation of the tumor border by a pathologist revealed a higher lymphocyte density, which is one of the known recurrent but not always defining visual features of HPV infection in head and neck tumors. xMIL-LRP allows pathologists to extract fine-grained insights about the model strategy. In contrast, the "attention" and "single" methods neither explain the negative prediction nor distinguish the relevant areas.
Figure 2: The two steps of xMIL: estimating the aggregation function (A) and the evidence function (B). Panel A shows a block diagram of a MIL model applied to a histopathology slide. The feature extraction module is typically a combination of a frozen foundation model followed by a shallow MLP. In most of the recent MIL models, the aggregation module uses attention mechanisms for combining the instance feature vectors into a single feature representation per bag. The prediction head is a linear layer or an MLP. Panel B schematically shows xMIL-LRP for explaining AttnMIL. In xMIL-LRP, the model output is backpropagated to the input instances. The colored lines represent the relevance flow. Red and blue colors encode the positive and negative values. The attention module is handled via the AH-rule as described in Section \ref{['sec:xmil-lrp']}. As discussed in Section \ref{['sec:xmil_properties']}, the instance explanation scores can be computed at the output of the foundation model or at the input level.
Figure 3: Patch dropping results for TransMIL. The first row depicts the perturbation curves, where the solid lines are the average perturbation curve and the shaded area is the standard error of the mean at each perturbation step. Each boxplot on the second row shows the distribution of AUPC values for all test set slides per explanation methods. In each boxplot, the red line marks the median and the red dot marks the mean. Lower perturbation curves and AUPCs represent higher faithfulness.
Figure 4: Patch dropping results for AttnMIL. The first row depicts the perturbation curves, where the solid lines are the average perturbation curve and the shaded area is the standard error of the mean at each perturbation step. Each boxplot on the second row shows the distribution of AUPC values for all test set slides per explanation methods. In each boxplot, the red line marks the median and the red dot marks the mean. Lower perturbation curves and AUPCs represent higher faithfulness.
Figure 5: Exemplary histological features of HPV-negative and -positive HNSC.
...and 3 more figures

Theorems & Definitions (2)

Definition 3.1: Explainable multiple instance learning
Definition 3.2: Properties of the evidence function

xMIL: Insightful Explanations for Multiple Instance Learning in Histopathology

TL;DR

Abstract

xMIL: Insightful Explanations for Multiple Instance Learning in Histopathology

Authors

TL;DR

Abstract

Table of Contents

Figures (8)

Theorems & Definitions (2)