Table of Contents
Fetching ...

Beyond the Black Box: Do More Complex Deep Learning Models Provide Superior XAI Explanations?

Mateusz Cedro, Marcin Chlebus

TL;DR

This study investigates whether increasing deep learning model complexity yields superior XAI explanations in medical imaging. Using four ResNet variants trained from scratch on 4,369 chest X-rays (COVID-19 vs Healthy), it evaluates classification performance and explanation quality via two metrics, $RRA$ and $PAR$, across gradient-based XAI methods (Saliency Maps, GradientShap, Integrated Gradients). The key finding is that deeper models do not consistently improve accuracy or explanation relevance; in many cases, the simpler ResNet-18 provides comparable or superior XAI performance, challenging the assumption that complexity enhances interpretability. The work highlights the importance of context-specific model selection, proper XAI method configuration, and the need for empirical frameworks to validate explanations in healthcare AI.

Abstract

The increasing complexity of Artificial Intelligence models poses challenges to interpretability, particularly in the healthcare sector. This study investigates the impact of deep learning model complexity and Explainable AI (XAI) efficacy, utilizing four ResNet architectures (ResNet-18, 34, 50, 101). Through methodical experimentation on 4,369 lung X-ray images of COVID-19-infected and healthy patients, the research evaluates models' classification performance and the relevance of corresponding XAI explanations with respect to the ground-truth disease masks. Results indicate that the increase in model complexity is associated with a decrease in classification accuracy and AUC-ROC scores (ResNet-18: 98.4%, 0.997; ResNet-101: 95.9%, 0.988). Notably, in eleven out of twelve statistical tests performed, no statistically significant differences occurred between XAI quantitative metrics - Relevance Rank Accuracy and the proposed Positive Attribution Ratio - across trained models. These results suggest that increased model complexity does not consistently lead to higher performance or relevance of explanations for models' decision-making processes.

Beyond the Black Box: Do More Complex Deep Learning Models Provide Superior XAI Explanations?

TL;DR

This study investigates whether increasing deep learning model complexity yields superior XAI explanations in medical imaging. Using four ResNet variants trained from scratch on 4,369 chest X-rays (COVID-19 vs Healthy), it evaluates classification performance and explanation quality via two metrics, and , across gradient-based XAI methods (Saliency Maps, GradientShap, Integrated Gradients). The key finding is that deeper models do not consistently improve accuracy or explanation relevance; in many cases, the simpler ResNet-18 provides comparable or superior XAI performance, challenging the assumption that complexity enhances interpretability. The work highlights the importance of context-specific model selection, proper XAI method configuration, and the need for empirical frameworks to validate explanations in healthcare AI.

Abstract

The increasing complexity of Artificial Intelligence models poses challenges to interpretability, particularly in the healthcare sector. This study investigates the impact of deep learning model complexity and Explainable AI (XAI) efficacy, utilizing four ResNet architectures (ResNet-18, 34, 50, 101). Through methodical experimentation on 4,369 lung X-ray images of COVID-19-infected and healthy patients, the research evaluates models' classification performance and the relevance of corresponding XAI explanations with respect to the ground-truth disease masks. Results indicate that the increase in model complexity is associated with a decrease in classification accuracy and AUC-ROC scores (ResNet-18: 98.4%, 0.997; ResNet-101: 95.9%, 0.988). Notably, in eleven out of twelve statistical tests performed, no statistically significant differences occurred between XAI quantitative metrics - Relevance Rank Accuracy and the proposed Positive Attribution Ratio - across trained models. These results suggest that increased model complexity does not consistently lead to higher performance or relevance of explanations for models' decision-making processes.
Paper Structure (24 sections, 6 equations, 9 figures, 5 tables)

This paper contains 24 sections, 6 equations, 9 figures, 5 tables.

Figures (9)

  • Figure 1: Residual learning: a building block. Source: Own preparation based on he_deep_2015.
  • Figure 2: Comprehensive Workflow Schema of the Research Methodology. The abbreviations RRA and PAR stand for Relevance Rank Accuracy and Positive Attribution Ratio, respectively. Note: the model representation is illustrative and may not precisely reflect the original and specific number and type of layers of each model. Source: Own preparation.
  • Figure 3: Computer Vision network architectures. Left: 34-layer plain network. Right: 34-layer residual network. Dotted lines represent dimension-expanding connections. Source: Own preparation based on he_deep_2015.
  • Figure 4: X-rays of healthy and COVID-19 infected lungs, with corresponding ground-truth masks. Masks for healthy individuals encompass the entire lungs. In contrast, for COVID-19 patients, masks delineate areas identified by radiologists as diseased, potentially covering specific regions or the entirety of the lungs. Source: Own preparation.
  • Figure 5: X-rays of healthy and COVID-19-infected lungs, accompanied by gradient-based attributions from Saliency Maps, GradientShap, and Integrated Gradients methodologies. Source: Own preparation with the use of Quantus library.
  • ...and 4 more figures