Beyond the Black Box: Do More Complex Deep Learning Models Provide Superior XAI Explanations?
Mateusz Cedro, Marcin Chlebus
TL;DR
This study investigates whether increasing deep learning model complexity yields superior XAI explanations in medical imaging. Using four ResNet variants trained from scratch on 4,369 chest X-rays (COVID-19 vs Healthy), it evaluates classification performance and explanation quality via two metrics, $RRA$ and $PAR$, across gradient-based XAI methods (Saliency Maps, GradientShap, Integrated Gradients). The key finding is that deeper models do not consistently improve accuracy or explanation relevance; in many cases, the simpler ResNet-18 provides comparable or superior XAI performance, challenging the assumption that complexity enhances interpretability. The work highlights the importance of context-specific model selection, proper XAI method configuration, and the need for empirical frameworks to validate explanations in healthcare AI.
Abstract
The increasing complexity of Artificial Intelligence models poses challenges to interpretability, particularly in the healthcare sector. This study investigates the impact of deep learning model complexity and Explainable AI (XAI) efficacy, utilizing four ResNet architectures (ResNet-18, 34, 50, 101). Through methodical experimentation on 4,369 lung X-ray images of COVID-19-infected and healthy patients, the research evaluates models' classification performance and the relevance of corresponding XAI explanations with respect to the ground-truth disease masks. Results indicate that the increase in model complexity is associated with a decrease in classification accuracy and AUC-ROC scores (ResNet-18: 98.4%, 0.997; ResNet-101: 95.9%, 0.988). Notably, in eleven out of twelve statistical tests performed, no statistically significant differences occurred between XAI quantitative metrics - Relevance Rank Accuracy and the proposed Positive Attribution Ratio - across trained models. These results suggest that increased model complexity does not consistently lead to higher performance or relevance of explanations for models' decision-making processes.
