Table of Contents
Fetching ...

Characterizing the Interpretability of Attention Maps in Digital Pathology

Tomé Albuquerque, Anil Yüce, Markus D. Herrmann, Alvaro Gomariz

TL;DR

This work tackles the risk that attention maps from attention-based MIL (ABMIL) in digital pathology may highlight spurious correlations rather than causal features. It proposes a controlled framework that injects tile- and WSI-based confounders via Bernoulli-modulated modifications $M_p^t$ and $M_p^s$, and evaluates both predictive performance and interpretability using the metrics Confounder Robustness (CR) and Normalized Cross Correlation (NCC). Through synthetic modifications (Clever Hans, blur, and pen marks) and a feature-based SDANA ablation, the authors show that ABMIL can rely on confounders to boost AUC and that attention maps increasingly target modified tiles as the modification probability $p$ grows, with robustness depending on confounder type. The framework is designed to be extensible to other confounders and features, offering a rigorous tool for benchmarking interpretability and guiding biomarker discovery in digital pathology.

Abstract

Interpreting machine learning model decisions is crucial for high-risk applications like healthcare. In digital pathology, large whole slide images (WSIs) are decomposed into smaller tiles and tile-derived features are processed by attention-based multiple instance learning (ABMIL) models to predict WSI-level labels. These networks generate tile-specific attention weights, which can be visualized as attention maps for interpretability. However, a standardized evaluation framework for these maps is lacking, questioning their reliability and ability to detect spurious correlations that can mislead models. We herein propose a framework to assess the ability of attention networks to attend to relevant features in digital pathology by creating artificial model confounders and using dedicated interpretability metrics. Models are trained and evaluated on data with tile modifications correlated with WSI labels, enabling the analysis of model sensitivity to artificial confounders and the accuracy of attention maps in highlighting them. Confounders are introduced either through synthetic tile modifications or through tile ablations based on their specific image-based features, with the latter being used to assess more clinically relevant scenarios. We also analyze the impact of varying confounder quantities at both the tile and WSI levels. Our results show that ABMIL models perform as desired within our framework. While attention maps generally highlight relevant regions, their robustness is affected by the type and number of confounders. Our versatile framework has the potential to be used in the evaluation of various methods and the exploration of image-based features driving model predictions, which could aid in biomarker discovery.

Characterizing the Interpretability of Attention Maps in Digital Pathology

TL;DR

This work tackles the risk that attention maps from attention-based MIL (ABMIL) in digital pathology may highlight spurious correlations rather than causal features. It proposes a controlled framework that injects tile- and WSI-based confounders via Bernoulli-modulated modifications and , and evaluates both predictive performance and interpretability using the metrics Confounder Robustness (CR) and Normalized Cross Correlation (NCC). Through synthetic modifications (Clever Hans, blur, and pen marks) and a feature-based SDANA ablation, the authors show that ABMIL can rely on confounders to boost AUC and that attention maps increasingly target modified tiles as the modification probability grows, with robustness depending on confounder type. The framework is designed to be extensible to other confounders and features, offering a rigorous tool for benchmarking interpretability and guiding biomarker discovery in digital pathology.

Abstract

Interpreting machine learning model decisions is crucial for high-risk applications like healthcare. In digital pathology, large whole slide images (WSIs) are decomposed into smaller tiles and tile-derived features are processed by attention-based multiple instance learning (ABMIL) models to predict WSI-level labels. These networks generate tile-specific attention weights, which can be visualized as attention maps for interpretability. However, a standardized evaluation framework for these maps is lacking, questioning their reliability and ability to detect spurious correlations that can mislead models. We herein propose a framework to assess the ability of attention networks to attend to relevant features in digital pathology by creating artificial model confounders and using dedicated interpretability metrics. Models are trained and evaluated on data with tile modifications correlated with WSI labels, enabling the analysis of model sensitivity to artificial confounders and the accuracy of attention maps in highlighting them. Confounders are introduced either through synthetic tile modifications or through tile ablations based on their specific image-based features, with the latter being used to assess more clinically relevant scenarios. We also analyze the impact of varying confounder quantities at both the tile and WSI levels. Our results show that ABMIL models perform as desired within our framework. While attention maps generally highlight relevant regions, their robustness is affected by the type and number of confounders. Our versatile framework has the potential to be used in the evaluation of various methods and the exploration of image-based features driving model predictions, which could aid in biomarker discovery.
Paper Structure (9 sections, 3 equations, 4 figures)

This paper contains 9 sections, 3 equations, 4 figures.

Figures (4)

  • Figure 1: Overview of our framework for the evaluation of attention maps in DP.
  • Figure 2: Examples of synthetic tile modifications employed.
  • Figure 3: Classification (top) and explainability performance results (middle and bottom) for synthetic experiments.
  • Figure 4: Feature-based sampling strategy. (a) Original distribution. (b) SDANA class separation. (c) Distribution after ablation. (d) Classification results.