Explainable embeddings with Distance Explainer
Christiaan Meijer, E. G. Patrick Bos
TL;DR
Explaining proximity in embedded spaces is challenging because dimensions encode abstract concepts. The authors propose Distance Explainer, a local post-hoc method that adapts RISE by masking the to-be-explained item, computing distances to a fixed reference in embedding space via $d_\mathrm{cos}$, and aggregating masks with a distance-ranked mirror filter. They evaluate on ImageNet and CLIP across image-image and image-caption pairs, showing that the method identifies salient features driving similarity or dissimilarity while maintaining robustness and model dependency. The work addresses a gap in XAI for embedded spaces and provides practical guidance on parameter choices and extensions to multimodal and textual embeddings.
Abstract
While eXplainable AI (XAI) has advanced significantly, few methods address interpretability in embedded vector spaces where dimensions represent complex abstractions. We introduce Distance Explainer, a novel method for generating local, post-hoc explanations of embedded spaces in machine learning models. Our approach adapts saliency-based techniques from RISE to explain the distance between two embedded data points by assigning attribution values through selective masking and distance-ranked mask filtering. We evaluate Distance Explainer on cross-modal embeddings (image-image and image-caption pairs) using established XAI metrics including Faithfulness, Sensitivity/Robustness, and Randomization. Experiments with ImageNet and CLIP models demonstrate that our method effectively identifies features contributing to similarity or dissimilarity between embedded data points while maintaining high robustness and consistency. We also explore how parameter tuning, particularly mask quantity and selection strategy, affects explanation quality. This work addresses a critical gap in XAI research and enhances transparency and trustworthiness in deep learning applications utilizing embedded spaces.
