Table of Contents
Fetching ...

Beyond single-model XAI: aggregating multi-model explanations for enhanced trustworthiness

Ilaria Vascotto, Alex Rodriguez, Alessandro Bonaita, Luca Bortolussi

TL;DR

Trustworthy AI requires robust explanations that withstand input variations and model disagreements. This work extends XAI by deriving kNN and RF-specific feature attributions, combining them with NN explanations via DeepLIFT, and aggregating across models to form a single, robust explanation. A local robustness estimator based on on-manifold neighbourhood perturbations assesses stability, while a medoid-based perturbation scheme preserves the data distribution and model predictions. Across five binary tabular datasets, the aggregation provides a conservative yet informative explanation, with NN explanations typically more robust than kNN, showcasing the potential of multi-model explanation aggregation to enhance trust in high-stakes settings.

Abstract

The use of Artificial Intelligence (AI) models in real-world and high-risk applications has intensified the discussion about their trustworthiness and ethical usage, from both a technical and a legislative perspective. The field of eXplainable Artificial Intelligence (XAI) addresses this challenge by proposing explanations that bring to light the decision-making processes of complex black-box models. Despite being an essential property, the robustness of explanations is often an overlooked aspect during development: only robust explanation methods can increase the trust in the system as a whole. This paper investigates the role of robustness through the usage of a feature importance aggregation derived from multiple models ($k$-nearest neighbours, random forest and neural networks). Preliminary results showcase the potential in increasing the trustworthiness of the application, while leveraging multiple model's predictive power.

Beyond single-model XAI: aggregating multi-model explanations for enhanced trustworthiness

TL;DR

Trustworthy AI requires robust explanations that withstand input variations and model disagreements. This work extends XAI by deriving kNN and RF-specific feature attributions, combining them with NN explanations via DeepLIFT, and aggregating across models to form a single, robust explanation. A local robustness estimator based on on-manifold neighbourhood perturbations assesses stability, while a medoid-based perturbation scheme preserves the data distribution and model predictions. Across five binary tabular datasets, the aggregation provides a conservative yet informative explanation, with NN explanations typically more robust than kNN, showcasing the potential of multi-model explanation aggregation to enhance trust in high-stakes settings.

Abstract

The use of Artificial Intelligence (AI) models in real-world and high-risk applications has intensified the discussion about their trustworthiness and ethical usage, from both a technical and a legislative perspective. The field of eXplainable Artificial Intelligence (XAI) addresses this challenge by proposing explanations that bring to light the decision-making processes of complex black-box models. Despite being an essential property, the robustness of explanations is often an overlooked aspect during development: only robust explanation methods can increase the trust in the system as a whole. This paper investigates the role of robustness through the usage of a feature importance aggregation derived from multiple models (-nearest neighbours, random forest and neural networks). Preliminary results showcase the potential in increasing the trustworthiness of the application, while leveraging multiple model's predictive power.

Paper Structure

This paper contains 15 sections, 4 equations, 5 figures, 5 tables, 2 algorithms.

Figures (5)

  • Figure 1: A summary of the methodology.
  • Figure 2: An example of the feature attributions on the adult dataset. The kNN (blue), RF (orange) and NN (green) explanations are connected by a vertical black line (the observed range). The red star represents the aggregation, whose value is presented within brackets close to the corresponding feature index (from 1 to 12).
  • Figure 3: Distribution of the robustness scores for the three models and their aggregation on the bank (left) and heloc dataset (right).
  • Figure 4: ROC curve computed for the three models and the aggregation for the bank (left) and heloc (right) datasets. The dotted line represents the bisecting line.
  • Figure A1: Distribution of the robustness scores for LIME, SHAP and DeepLIFT on the bank (left) and heloc dataset (right). All three XAI methods were applied to the outputs of the neural network model for comparability. The gray dashed line represents a robustness threshold of $0.50$.