Beyond single-model XAI: aggregating multi-model explanations for enhanced trustworthiness
Ilaria Vascotto, Alex Rodriguez, Alessandro Bonaita, Luca Bortolussi
TL;DR
Trustworthy AI requires robust explanations that withstand input variations and model disagreements. This work extends XAI by deriving kNN and RF-specific feature attributions, combining them with NN explanations via DeepLIFT, and aggregating across models to form a single, robust explanation. A local robustness estimator based on on-manifold neighbourhood perturbations assesses stability, while a medoid-based perturbation scheme preserves the data distribution and model predictions. Across five binary tabular datasets, the aggregation provides a conservative yet informative explanation, with NN explanations typically more robust than kNN, showcasing the potential of multi-model explanation aggregation to enhance trust in high-stakes settings.
Abstract
The use of Artificial Intelligence (AI) models in real-world and high-risk applications has intensified the discussion about their trustworthiness and ethical usage, from both a technical and a legislative perspective. The field of eXplainable Artificial Intelligence (XAI) addresses this challenge by proposing explanations that bring to light the decision-making processes of complex black-box models. Despite being an essential property, the robustness of explanations is often an overlooked aspect during development: only robust explanation methods can increase the trust in the system as a whole. This paper investigates the role of robustness through the usage of a feature importance aggregation derived from multiple models ($k$-nearest neighbours, random forest and neural networks). Preliminary results showcase the potential in increasing the trustworthiness of the application, while leveraging multiple model's predictive power.
