Table of Contents
Fetching ...

MeCaMIL: Causality-Aware Multiple Instance Learning for Fair and Interpretable Whole Slide Image Diagnosis

Yiran Song, Yikai Zhang, Shuang Zhou, Guojun Xiong, Xiaofeng Yang, Nian Wang, Fenglong Ma, Rui Zhang, Mingquan Lin

TL;DR

MeCaMIL addresses the need for fair and interpretable whole-slide image diagnosis by integrating a structured causal graph that models demographics as exogenous factors influencing a disease representation Z, which in turn determines the diagnosis Y. The method combines a causality-aware attention mechanism, a graph neural network-based SEM, and a demographic reconstruction objective to disentangle disease signals from demographic biases. Empirical results on CAMELYON16, TCGA-Lung, and TCGA-Multi show state-of-the-art accuracy and substantial reductions in demographic disparity, with extensions to survival prediction across five cancer types demonstrating robustness under temporal outcomes. The work provides a principled framework for fair, interpretable AI in digital pathology, with ablations confirming the essential role of the collider-based causal structure and offering a blueprint for broader fairness-aware medical imaging applications.

Abstract

Multiple instance learning (MIL) has emerged as the dominant paradigm for whole slide image (WSI) analysis in computational pathology, achieving strong diagnostic performance through patch-level feature aggregation. However, existing MIL methods face critical limitations: (1) they rely on attention mechanisms that lack causal interpretability, and (2) they fail to integrate patient demographics (age, gender, race), leading to fairness concerns across diverse populations. These shortcomings hinder clinical translation, where algorithmic bias can exacerbate health disparities. We introduce \textbf{MeCaMIL}, a causality-aware MIL framework that explicitly models demographic confounders through structured causal graphs. Unlike prior approaches treating demographics as auxiliary features, MeCaMIL employs principled causal inference -- leveraging do-calculus and collider structures -- to disentangle disease-relevant signals from spurious demographic correlations. Extensive evaluation on three benchmarks demonstrates state-of-the-art performance across CAMELYON16 (ACC/AUC/F1: 0.939/0.983/0.946), TCGA-Lung (0.935/0.979/0.931), and TCGA-Multi (0.977/0.993/0.970, five cancer types). Critically, MeCaMIL achieves superior fairness -- demographic disparity variance drops by over 65% relative reduction on average across attributes, with notable improvements for underserved populations. The framework generalizes to survival prediction (mean C-index: 0.653, +0.017 over best baseline across five cancer types). Ablation studies confirm causal graph structure is essential -- alternative designs yield 0.048 lower accuracy and 4.2x times worse fairness. These results establish MeCaMIL as a principled framework for fair, interpretable, and clinically actionable AI in digital pathology. Code will be released upon acceptance.

MeCaMIL: Causality-Aware Multiple Instance Learning for Fair and Interpretable Whole Slide Image Diagnosis

TL;DR

MeCaMIL addresses the need for fair and interpretable whole-slide image diagnosis by integrating a structured causal graph that models demographics as exogenous factors influencing a disease representation Z, which in turn determines the diagnosis Y. The method combines a causality-aware attention mechanism, a graph neural network-based SEM, and a demographic reconstruction objective to disentangle disease signals from demographic biases. Empirical results on CAMELYON16, TCGA-Lung, and TCGA-Multi show state-of-the-art accuracy and substantial reductions in demographic disparity, with extensions to survival prediction across five cancer types demonstrating robustness under temporal outcomes. The work provides a principled framework for fair, interpretable AI in digital pathology, with ablations confirming the essential role of the collider-based causal structure and offering a blueprint for broader fairness-aware medical imaging applications.

Abstract

Multiple instance learning (MIL) has emerged as the dominant paradigm for whole slide image (WSI) analysis in computational pathology, achieving strong diagnostic performance through patch-level feature aggregation. However, existing MIL methods face critical limitations: (1) they rely on attention mechanisms that lack causal interpretability, and (2) they fail to integrate patient demographics (age, gender, race), leading to fairness concerns across diverse populations. These shortcomings hinder clinical translation, where algorithmic bias can exacerbate health disparities. We introduce \textbf{MeCaMIL}, a causality-aware MIL framework that explicitly models demographic confounders through structured causal graphs. Unlike prior approaches treating demographics as auxiliary features, MeCaMIL employs principled causal inference -- leveraging do-calculus and collider structures -- to disentangle disease-relevant signals from spurious demographic correlations. Extensive evaluation on three benchmarks demonstrates state-of-the-art performance across CAMELYON16 (ACC/AUC/F1: 0.939/0.983/0.946), TCGA-Lung (0.935/0.979/0.931), and TCGA-Multi (0.977/0.993/0.970, five cancer types). Critically, MeCaMIL achieves superior fairness -- demographic disparity variance drops by over 65% relative reduction on average across attributes, with notable improvements for underserved populations. The framework generalizes to survival prediction (mean C-index: 0.653, +0.017 over best baseline across five cancer types). Ablation studies confirm causal graph structure is essential -- alternative designs yield 0.048 lower accuracy and 4.2x times worse fairness. These results establish MeCaMIL as a principled framework for fair, interpretable, and clinically actionable AI in digital pathology. Code will be released upon acceptance.

Paper Structure

This paper contains 35 sections, 22 equations, 5 figures, 8 tables.

Figures (5)

  • Figure 1: Overview of the proposed MeCaMIL for whole slide image (WSI) diagnosis. Each WSI is divided into image patches, which are independently encoded into 2048-dimensional instance features using a pretrained encoder. These features are first passed to the Instance Classifier, which comprises a Feature Alignment Layer and a shallow Instance-Level Classifier, generating per-patch predictions and aligned feature representations. In parallel, features are aggregated in the Causal Bag Classifier via a multi-head Attention Module to produce a bag-level embedding. This embedding is further processed through Nonlinear Masking Blocks and a Causal Graph Module to capture structured dependencies and inject exogenous demographic priors (e.g., gender, race, age) for debiasing. In the Causal Graph Module, solid arrows denote causal edges that persist during both training and inference, while dashed arrows represent auxiliary edges used exclusively for gradient backpropagation during training. The resulting representations from both instance and bag branches are integrated in the Prediction Head, supporting diverse downstream tasks including classification, survival prediction, and fairness-aware modeling. Arrows denote the data flow and transformation at each module, with exogenous priors influencing the causal path through latent variable modulation.
  • Figure 2: Demographic distribution statistics on TCGA datasets.
  • Figure 3: Accuracy and fairness evaluation across gender, race, and age on TCGA datasets.
  • Figure 4: Attention Visualization on CAMELYON16 datasets.
  • Figure 5: UMAP Embedding Analysis on TCGA datasets.