Explainability for Machine Learning Models: From Data Adaptability to User Perception

julien Delaunay

Explainability for Machine Learning Models: From Data Adaptability to User Perception

julien Delaunay

TL;DR

This work addresses the explainability of machine learning models by (i) developing data-centric explanation methods that adapt to feature types and data distributions (Anchors with MDLP/k-means discretization and Pertinent Negatives; Growing Fields for counterfactuals; Growing Language/Net for text), and (ii) conducting rigorous user-centered evaluations to understand how explanation type and representation affect trust and understanding. A key contribution is the Adapted Post-hoc Explanations (APE) framework, which benchmarks local linear explainability against rule-based methods, guided by a novel Oracle that assesses when linear surrogates are suitable around a target instance. The thesis demonstrates that carefully chosen explanations, and notably a hybrid mix of linear, rule-based, and counterfactual outputs, improve fidelity and user engagement, while revealing that linear explanations are not universally applicable. It also shows that transparent counterfactual approaches for text can outperform opaque latent-space methods in terms of minimality, plausibility, and stability. Overall, the work advances explainability by combining data-driven explanation design with user-centered validation, highlighting practical implications for transparency, trust, and usability in deployed AI systems.

Abstract

This thesis explores the generation of local explanations for already deployed machine learning models, aiming to identify optimal conditions for producing meaningful explanations considering both data and user requirements. The primary goal is to develop methods for generating explanations for any model while ensuring that these explanations remain faithful to the underlying model and comprehensible to the users. The thesis is divided into two parts. The first enhances a widely used rule-based explanation method. It then introduces a novel approach for evaluating the suitability of linear explanations to approximate a model. Additionally, it conducts a comparative experiment between two families of counterfactual explanation methods to analyze the advantages of one over the other. The second part focuses on user experiments to assess the impact of three explanation methods and two distinct representations. These experiments measure how users perceive their interaction with the model in terms of understanding and trust, depending on the explanations and representations. This research contributes to a better explanation generation, with potential implications for enhancing the transparency, trustworthiness, and usability of deployed AI systems.

Explainability for Machine Learning Models: From Data Adaptability to User Perception

TL;DR

Abstract

Paper Structure (250 sections, 30 equations, 87 figures, 44 tables, 18 algorithms)

This paper contains 250 sections, 30 equations, 87 figures, 44 tables, 18 algorithms.

Foundations of Explainability
Explainable AI
Self-Explainable vs Post-Hoc Explanations
Global vs Local Explanations
Model Dependent vs Model Agnostic
Explanation Paradigms
Notation
Rule-based Explanations
Feature-Attribution
Example-based Explanations
Evaluating Explanations Techniques
Surrogate-Based Evaluation Criteria
Adherence and Fidelity
Stability and Uncertainty
Simplicity and Conciseness
...and 235 more sections

Figures (87)

Figure 1: Visual representation of a machine learning model designed to detect fake news within newspaper titles. This model is trained on diverse labeled examples to accurately classify news articles as either "fake" or "true". The aspect of explainability is added to the system to provide comprehensive and transparent insights into the model's predictions.
Figure 2: Illustration of two models, a linear model and a decision tree, employed to predict loan approval likelihood based on applicants' age and salary. Blue circles represent loan rejections, while red stars represent loan approvals.
Figure 3: Taxonomy of the explanation techniques. Paths in green represent the explanation techniques studied in this thesis.
Figure 4: Rule-based explanation for a sentiment classification model. This rule specifies that the model predicted positively due to the presence of the words 'truly' and 'interesting'.
Figure 5: Feature-attribution explanation for a sentiment classification model. The length of each bar represents the extent to which the presence of a specific word in the sentence influences the model's prediction toward the corresponding class (positive or negative).
...and 82 more figures

Explainability for Machine Learning Models: From Data Adaptability to User Perception

TL;DR

Abstract

Explainability for Machine Learning Models: From Data Adaptability to User Perception

Authors

TL;DR

Abstract

Table of Contents

Figures (87)