Benchmarking the Influence of Pre-training on Explanation Performance in MR Image Classification
Marta Oliveira, Rick Wilming, Benedict Clark, Céline Budding, Fabian Eitel, Kerstin Ritter, Stefan Haufe
TL;DR
This work introduces a realistic synthetic MRI benchmark by overlaying ground-truth lesion masks on real brain scans to enable objective evaluation of explanation quality across XAI methods. It systematically investigates how pre-training (within-domain MRI vs out-of-domain ImageNet) and layer-wise fine-tuning affect both classification accuracy and explanation performance, using a VGG-16 backbone and multiple Captum explanations. The study finds a strong correlation between classifier performance and explanation quality, but shows that MRI-domain pre-training yields better explanations when accuracy is matched, while within-domain pre-training yields more stable explanations across a range of accuracies. This benchmark provides a principled framework for validating XAI methods in medical imaging and highlights the need to account for training regime when interpreting explanation heatmaps in high-stakes settings.
Abstract
Convolutional Neural Networks (CNNs) are frequently and successfully used in medical prediction tasks. They are often used in combination with transfer learning, leading to improved performance when training data for the task are scarce. The resulting models are highly complex and typically do not provide any insight into their predictive mechanisms, motivating the field of "explainable" artificial intelligence (XAI). However, previous studies have rarely quantitatively evaluated the "explanation performance" of XAI methods against ground-truth data, and transfer learning and its influence on objective measures of explanation performance has not been investigated. Here, we propose a benchmark dataset that allows for quantifying explanation performance in a realistic magnetic resonance imaging (MRI) classification task. We employ this benchmark to understand the influence of transfer learning on the quality of explanations. Experimental results show that popular XAI methods applied to the same underlying model differ vastly in performance, even when considering only correctly classified examples. We further observe that explanation performance strongly depends on the task used for pre-training and the number of CNN layers pre-trained. These results hold after correcting for a substantial correlation between explanation and classification performance.
