Representational Difference Explanations
Neehar Kondapaneni, Oisin Mac Aodha, Pietro Perona
TL;DR
Representational Difference Explanations (RDX) introduces a training-free, difference-centric framework to contrast two model representations and visualize where they disagree. By constructing neighborhood-based distance matrices, applying a locally biased difference function, and sampling difference explanations via spectral clustering (with optional alignment via centered kernel alignment), RDX yields interpretable concept grids that highlight model-specific distinctions. Empirical results show RDX reliably recovers known differences and uncovers previously unknown ones across MNIST-inspired tasks and large vision models on ImageNet/iNaturalist, outperforming dictionary-learning XAI baselines on the primary metric (Binary Success Rate) and related semantics metrics. The method offers a practical tool for model comparison with broad applicability, while acknowledging limitations in scalability, distance assumptions, and potential biases in external evaluators; future work may extend RDX to text and multimodal representations and integrate supervised cues for enhanced interpretability.
Abstract
We propose a method for discovering and visualizing the differences between two learned representations, enabling more direct and interpretable model comparisons. We validate our method, which we call Representational Differences Explanations (RDX), by using it to compare models with known conceptual differences and demonstrate that it recovers meaningful distinctions where existing explainable AI (XAI) techniques fail. Applied to state-of-the-art models on challenging subsets of the ImageNet and iNaturalist datasets, RDX reveals both insightful representational differences and subtle patterns in the data. Although comparison is a cornerstone of scientific analysis, current tools in machine learning, namely post hoc XAI methods, struggle to support model comparison effectively. Our work addresses this gap by introducing an effective and explainable tool for contrasting model representations.
