Table of Contents
Fetching ...

SI-MIL: Taming Deep MIL for Self-Interpretability in Gigapixel Histopathology

Saarthak Kapse, Pushpak Pati, Srijan Das, Jingwei Zhang, Chao Chen, Maria Vakalopoulou, Joel Saltz, Dimitris Samaras, Rajarsi R. Gupta, Prateek Prasanna

TL;DR

This work introduces Self-Interpretable MIL (SI-MIL), a dual-branch framework that jointly trains a deep MIL model with a pathologist-friendly, handcrafted PathExpert feature branch to deliver inherent, feature-level interpretability for gigapixel histopathology WSIs. By tying a differentiable Top-$K$ patch selector to a linear, PathExpert-based predictor and guiding it with deep feature supervision, SI-MIL achieves competitive predictive performance across breast, lung, and colorectal cancer datasets while providing faithful, pathologist-aligned explanations. The authors validate interpretability locally (slide-level explanations) and globally (cohort-level separability), conduct domain expert studies, and release a ~2.2K-WSI dataset with nuclei maps and PathExpert features to spur reproducible interpretable MIL research. Overall, SI-MIL demonstrates that interpretability and accuracy can be jointly optimized in WSIs, enabling feature-grounded reasoning that aligns with clinical practice and paving the way for broader adoption in pathology. The work also outlines extensive ablations, dataset contributions, and visualization tools to support future work in interpretable MIL for medical imaging.

Abstract

Introducing interpretability and reasoning into Multiple Instance Learning (MIL) methods for Whole Slide Image (WSI) analysis is challenging, given the complexity of gigapixel slides. Traditionally, MIL interpretability is limited to identifying salient regions deemed pertinent for downstream tasks, offering little insight to the end-user (pathologist) regarding the rationale behind these selections. To address this, we propose Self-Interpretable MIL (SI-MIL), a method intrinsically designed for interpretability from the very outset. SI-MIL employs a deep MIL framework to guide an interpretable branch grounded on handcrafted pathological features, facilitating linear predictions. Beyond identifying salient regions, SI-MIL uniquely provides feature-level interpretations rooted in pathological insights for WSIs. Notably, SI-MIL, with its linear prediction constraints, challenges the prevalent myth of an inevitable trade-off between model interpretability and performance, demonstrating competitive results compared to state-of-the-art methods on WSI-level prediction tasks across three cancer types. In addition, we thoroughly benchmark the local and global-interpretability of SI-MIL in terms of statistical analysis, a domain expert study, and desiderata of interpretability, namely, user-friendliness and faithfulness.

SI-MIL: Taming Deep MIL for Self-Interpretability in Gigapixel Histopathology

TL;DR

This work introduces Self-Interpretable MIL (SI-MIL), a dual-branch framework that jointly trains a deep MIL model with a pathologist-friendly, handcrafted PathExpert feature branch to deliver inherent, feature-level interpretability for gigapixel histopathology WSIs. By tying a differentiable Top- patch selector to a linear, PathExpert-based predictor and guiding it with deep feature supervision, SI-MIL achieves competitive predictive performance across breast, lung, and colorectal cancer datasets while providing faithful, pathologist-aligned explanations. The authors validate interpretability locally (slide-level explanations) and globally (cohort-level separability), conduct domain expert studies, and release a ~2.2K-WSI dataset with nuclei maps and PathExpert features to spur reproducible interpretable MIL research. Overall, SI-MIL demonstrates that interpretability and accuracy can be jointly optimized in WSIs, enabling feature-grounded reasoning that aligns with clinical practice and paving the way for broader adoption in pathology. The work also outlines extensive ablations, dataset contributions, and visualization tools to support future work in interpretable MIL for medical imaging.

Abstract

Introducing interpretability and reasoning into Multiple Instance Learning (MIL) methods for Whole Slide Image (WSI) analysis is challenging, given the complexity of gigapixel slides. Traditionally, MIL interpretability is limited to identifying salient regions deemed pertinent for downstream tasks, offering little insight to the end-user (pathologist) regarding the rationale behind these selections. To address this, we propose Self-Interpretable MIL (SI-MIL), a method intrinsically designed for interpretability from the very outset. SI-MIL employs a deep MIL framework to guide an interpretable branch grounded on handcrafted pathological features, facilitating linear predictions. Beyond identifying salient regions, SI-MIL uniquely provides feature-level interpretations rooted in pathological insights for WSIs. Notably, SI-MIL, with its linear prediction constraints, challenges the prevalent myth of an inevitable trade-off between model interpretability and performance, demonstrating competitive results compared to state-of-the-art methods on WSI-level prediction tasks across three cancer types. In addition, we thoroughly benchmark the local and global-interpretability of SI-MIL in terms of statistical analysis, a domain expert study, and desiderata of interpretability, namely, user-friendliness and faithfulness.
Paper Structure (35 sections, 10 equations, 16 figures, 9 tables)

This paper contains 35 sections, 10 equations, 16 figures, 9 tables.

Figures (16)

  • Figure 1: Unlike conventional MIL, SI-MIL co-learns from deep and handcrafted features (referred to as PathExpert features). While both MILs offer patch-level interpretability, only ours provides PathExpert feature-level rationale for WSI predictions. The attention maps in SI-MIL are grounded on geometrically and physically-interpretable descriptors.
  • Figure 2: Overview of SI-MIL: Conventional MIL branch guides the Patch Attention-Guided Top-$K$ (PAG Top-K) patch selection module to select the PathExpert features of key regions from WSI, followed by linear scaling in the Self-Interpretable branch, and linear prediction.
  • Figure 3: Qualitative Patch-Feature importance report: In (a) and (b), we present WSIs with overlaid attention heatmaps and the top two patches, along with their nuclei maps. In (c), we demonstrate the mean contribution magnitude of select representative features across the top $K$ patches employed in the Self-Interpretable branch. Additionally, we display a feature density plot that quantifies the distribution of features within the $K$ patches. For brevity, we omit the y-axis. Given that these features are normalized, a curve leaning towards the right indicates higher/positive values, while one towards the left signifies lower/negative values, depending on the feature. Finally, in (d), we illustrate, the description and visualization of representative features in (c) with varying value.
  • Figure 4: Cohort-level Interpretation: Separability of top $K$ patches of WSIs across classes in the PathExpert feature space. Multivariate and Univariate analyses depict that the top $K$ patches selected by SI-MIL and their PathExpert features are more separable.
  • Figure 5: PAG Top-$K$ module ablation
  • ...and 11 more figures