Enhancing Interpretability in Generative AI Through Search-Based Data Influence Analysis
Theodoros Aivalis, Iraklis A. Klampanos, Antonis Troumpoukis, Joemon M. Jose
TL;DR
This work tackles the interpretability gap in generative AI by introducing a model-agnostic, data-centric framework that links generated outputs to training data through a two-step, prompt-guided retrieval and comparison process. The method defines a Data Influence function $Data\ Influence(x,y) = \sum_{i=1}^{m} \alpha(y,y_i) \cdot B(x,x_i)$ with a textual filter $B$ and cosine-based kernel $K$ to weight training samples by visual similarity, allowing observational explanations without internal gradient access. It is validated via experiments on a locally trained DALL-E-like model and large-scale web-based evaluations, showing that removing influential samples reduces similarity to reference outputs and that retrieved training samples can reflect copyright-related substances. The approach holds practical value for copyright tracing and dataset transparency, while highlighting the need for accessible training data metadata and future enhancements including domain expert evaluations and richer explainable representations.
Abstract
Generative AI models offer powerful capabilities but often lack transparency, making it difficult to interpret their output. This is critical in cases involving artistic or copyrighted content. This work introduces a search-inspired approach to improve the interpretability of these models by analysing the influence of training data on their outputs. Our method provides observational interpretability by focusing on a model's output rather than on its internal state. We consider both raw data and latent-space embeddings when searching for the influence of data items in generated content. We evaluate our method by retraining models locally and by demonstrating the method's ability to uncover influential subsets in the training data. This work lays the groundwork for future extensions, including user-based evaluations with domain experts, which is expected to improve observational interpretability further.
