Table of Contents
Fetching ...

Explainable embeddings with Distance Explainer

Christiaan Meijer, E. G. Patrick Bos

TL;DR

Explaining proximity in embedded spaces is challenging because dimensions encode abstract concepts. The authors propose Distance Explainer, a local post-hoc method that adapts RISE by masking the to-be-explained item, computing distances to a fixed reference in embedding space via $d_\mathrm{cos}$, and aggregating masks with a distance-ranked mirror filter. They evaluate on ImageNet and CLIP across image-image and image-caption pairs, showing that the method identifies salient features driving similarity or dissimilarity while maintaining robustness and model dependency. The work addresses a gap in XAI for embedded spaces and provides practical guidance on parameter choices and extensions to multimodal and textual embeddings.

Abstract

While eXplainable AI (XAI) has advanced significantly, few methods address interpretability in embedded vector spaces where dimensions represent complex abstractions. We introduce Distance Explainer, a novel method for generating local, post-hoc explanations of embedded spaces in machine learning models. Our approach adapts saliency-based techniques from RISE to explain the distance between two embedded data points by assigning attribution values through selective masking and distance-ranked mask filtering. We evaluate Distance Explainer on cross-modal embeddings (image-image and image-caption pairs) using established XAI metrics including Faithfulness, Sensitivity/Robustness, and Randomization. Experiments with ImageNet and CLIP models demonstrate that our method effectively identifies features contributing to similarity or dissimilarity between embedded data points while maintaining high robustness and consistency. We also explore how parameter tuning, particularly mask quantity and selection strategy, affects explanation quality. This work addresses a critical gap in XAI research and enhances transparency and trustworthiness in deep learning applications utilizing embedded spaces.

Explainable embeddings with Distance Explainer

TL;DR

Explaining proximity in embedded spaces is challenging because dimensions encode abstract concepts. The authors propose Distance Explainer, a local post-hoc method that adapts RISE by masking the to-be-explained item, computing distances to a fixed reference in embedding space via , and aggregating masks with a distance-ranked mirror filter. They evaluate on ImageNet and CLIP across image-image and image-caption pairs, showing that the method identifies salient features driving similarity or dissimilarity while maintaining robustness and model dependency. The work addresses a gap in XAI for embedded spaces and provides practical guidance on parameter choices and extensions to multimodal and textual embeddings.

Abstract

While eXplainable AI (XAI) has advanced significantly, few methods address interpretability in embedded vector spaces where dimensions represent complex abstractions. We introduce Distance Explainer, a novel method for generating local, post-hoc explanations of embedded spaces in machine learning models. Our approach adapts saliency-based techniques from RISE to explain the distance between two embedded data points by assigning attribution values through selective masking and distance-ranked mask filtering. We evaluate Distance Explainer on cross-modal embeddings (image-image and image-caption pairs) using established XAI metrics including Faithfulness, Sensitivity/Robustness, and Randomization. Experiments with ImageNet and CLIP models demonstrate that our method effectively identifies features contributing to similarity or dissimilarity between embedded data points while maintaining high robustness and consistency. We also explore how parameter tuning, particularly mask quantity and selection strategy, affects explanation quality. This work addresses a critical gap in XAI research and enhances transparency and trustworthiness in deep learning applications utilizing embedded spaces.

Paper Structure

This paper contains 29 sections, 2 equations, 16 figures, 2 tables.

Figures (16)

  • Figure 1: Incremental deletion on the bee image whose distance to a fly's image is explained. Deleted pixels are coloured brown. From left to right, in every column a larger percentage of pixels is deleted. The top row has LoDF order. The middle row has HiDF order. The bottom row has random order.
  • Figure 2: The vertical axis shows distance under incremental deletion on the bee image whose distance to a fly image is explained. Percentage of deleted pixels on the horizontal axis.
  • Figure 3: Incremental deletion on the bee image whose distance to another bee's image is explained. See the caption of figure \ref{['fig:incremental_deletion_bee_vs_fly_pixel_removal']} for a description of the rows and columns.
  • Figure 4: Like figure \ref{['fig:incremental_deletion_bee_vs_fly_graphs']}, but with another bee image as reference item.
  • Figure 5: MPRT (top-down) results: The first image shows the saliency map for the unperturbed model. From left to right, the second images onwards show saliency maps for the model of which iteratively one additional layer has its weights perturbed, starting with the final fully connected layer and ending with weights of all layers in the model perturbed.
  • ...and 11 more figures