On the notion of missingness for path attribution explainability methods in medical settings: Guiding the selection of medically meaningful baselines
Alexander Geiger, Lars Wagner, Daniel Rueckert, Dirk Wilhelm, Alissa Jell
TL;DR
This paper tackles explainability in medical AI by revisiting baseline choices used in path attribution methods like Integrated Gradients (IG). It introduces a principled, input-specific counterfactual baseline generated via a variational autoencoder to represent a clinically normal state, enabling more faithful attributions. The approach extends IG to counterfactual baselines (CF) and to an EG variant (EG(CF)) and demonstrates superior localization and alignment with ground-truth pathology across three medical datasets (Manometry, Chest X-ray, Brain MRI) compared with standard baselines and Latent Integrated Gradients (LIG). The results underscore the importance of semantically meaningful baselines for trustworthy explanations and offer a model-agnostic framework that can be integrated with other counterfactual methods to improve clinical interpretability.
Abstract
The explainability of deep learning models remains a significant challenge, particularly in the medical domain where interpretable outputs are critical for clinical trust and transparency. Path attribution methods such as Integrated Gradients rely on a baseline representing the absence of relevant features ("missingness"). Commonly used baselines, such as all-zero inputs, are often semantically meaningless, especially in medical contexts. While alternative baseline choices have been explored, existing methods lack a principled approach to dynamically select baselines tailored to each input. In this work, we examine the notion of missingness in the medical context, analyze its implications for baseline selection, and introduce a counterfactual-guided approach to address the limitations of conventional baselines. We argue that a generated counterfactual (i.e. clinically "normal" variation of the pathological input) represents a more accurate representation of a meaningful absence of features. We use a Variational Autoencoder in our implementation, though our concept is model-agnostic and can be applied with any suitable counterfactual method. We evaluate our concept on three distinct medical data sets and empirically demonstrate that counterfactual baselines yield more faithful and medically relevant attributions, outperforming standard baseline choices as well as other related methods.
