Table of Contents
Fetching ...

Beyond the Veil of Similarity: Quantifying Semantic Continuity in Explainable AI

Qi Huang, Emanuele Mezzi, Osman Mutlu, Miltiadis Kofinas, Vidya Prasad, Shadnan Azwad Khan, Elena Ranguelova, Niki van Stein

TL;DR

This paper tackles the challenge of interpretability by introducing semantic continuity as a quantitative criterion for XAI explanations: similar inputs should yield similar explanations. It defines a formal metric to measure this property and evaluates it in image classification using simple shape variations and a synthetic facial dataset, comparing RISE, LIME, GradCAM, and KernelSHAP explainers. The findings indicate GradCAM offers the strongest semantic continuity, KernelSHAP closely follows, while LIME tends to be less stable, with RISE showing middle-ground performance in several scenarios. The work provides a practical framework and metrics for assessing XAI explanations, enabling more reliable explainer selection and advancing transparent AI across domains.

Abstract

We introduce a novel metric for measuring semantic continuity in Explainable AI methods and machine learning models. We posit that for models to be truly interpretable and trustworthy, similar inputs should yield similar explanations, reflecting a consistent semantic understanding. By leveraging XAI techniques, we assess semantic continuity in the task of image recognition. We conduct experiments to observe how incremental changes in input affect the explanations provided by different XAI methods. Through this approach, we aim to evaluate the models' capability to generalize and abstract semantic concepts accurately and to evaluate different XAI methods in correctly capturing the model behaviour. This paper contributes to the broader discourse on AI interpretability by proposing a quantitative measure for semantic continuity for XAI methods, offering insights into the models' and explainers' internal reasoning processes, and promoting more reliable and transparent AI systems.

Beyond the Veil of Similarity: Quantifying Semantic Continuity in Explainable AI

TL;DR

This paper tackles the challenge of interpretability by introducing semantic continuity as a quantitative criterion for XAI explanations: similar inputs should yield similar explanations. It defines a formal metric to measure this property and evaluates it in image classification using simple shape variations and a synthetic facial dataset, comparing RISE, LIME, GradCAM, and KernelSHAP explainers. The findings indicate GradCAM offers the strongest semantic continuity, KernelSHAP closely follows, while LIME tends to be less stable, with RISE showing middle-ground performance in several scenarios. The work provides a practical framework and metrics for assessing XAI explanations, enabling more reliable explainer selection and advancing transparent AI across domains.

Abstract

We introduce a novel metric for measuring semantic continuity in Explainable AI methods and machine learning models. We posit that for models to be truly interpretable and trustworthy, similar inputs should yield similar explanations, reflecting a consistent semantic understanding. By leveraging XAI techniques, we assess semantic continuity in the task of image recognition. We conduct experiments to observe how incremental changes in input affect the explanations provided by different XAI methods. Through this approach, we aim to evaluate the models' capability to generalize and abstract semantic concepts accurately and to evaluate different XAI methods in correctly capturing the model behaviour. This paper contributes to the broader discourse on AI interpretability by proposing a quantitative measure for semantic continuity for XAI methods, offering insights into the models' and explainers' internal reasoning processes, and promoting more reliable and transparent AI systems.
Paper Structure (22 sections, 5 equations, 11 figures)

This paper contains 22 sections, 5 equations, 11 figures.

Figures (11)

  • Figure 1: Demonstrations of the data used for the proof-of-concept experiment.
  • Figure 2: Examples of training data used in our second experiment.
  • Figure 3: An example sub-series of a test case in our second experiment. The leftmost image illustrates a randomly generated face of a non-real girl without glasses. From left to right, we use generative models to gradually add a pair of half-rimless glasses to the images.
  • Figure 4: An example sub-series of a test case that does not hold for Definition \ref{['definition: semantic variation']}. The context here is the same as that of Figure \ref{['fig:only-face-example-60916']}. However, as the semantic variation (indicator) increases, the fidelity of images being with glasses does not increase in the end. And the final image, if talking from human inspections, is not fully convincing to be with glasses.
  • Figure 5: Rotation transformation.
  • ...and 6 more figures

Theorems & Definitions (4)

  • definition thmcounterdefinition
  • definition thmcounterdefinition: Semantic variation
  • definition thmcounterdefinition: Predictor semantic continuity
  • definition thmcounterdefinition: Explainer semantic continuity