Table of Contents
Fetching ...

Enhancing Interpretability in Generative AI Through Search-Based Data Influence Analysis

Theodoros Aivalis, Iraklis A. Klampanos, Antonis Troumpoukis, Joemon M. Jose

TL;DR

This work tackles the interpretability gap in generative AI by introducing a model-agnostic, data-centric framework that links generated outputs to training data through a two-step, prompt-guided retrieval and comparison process. The method defines a Data Influence function $Data\ Influence(x,y) = \sum_{i=1}^{m} \alpha(y,y_i) \cdot B(x,x_i)$ with a textual filter $B$ and cosine-based kernel $K$ to weight training samples by visual similarity, allowing observational explanations without internal gradient access. It is validated via experiments on a locally trained DALL-E-like model and large-scale web-based evaluations, showing that removing influential samples reduces similarity to reference outputs and that retrieved training samples can reflect copyright-related substances. The approach holds practical value for copyright tracing and dataset transparency, while highlighting the need for accessible training data metadata and future enhancements including domain expert evaluations and richer explainable representations.

Abstract

Generative AI models offer powerful capabilities but often lack transparency, making it difficult to interpret their output. This is critical in cases involving artistic or copyrighted content. This work introduces a search-inspired approach to improve the interpretability of these models by analysing the influence of training data on their outputs. Our method provides observational interpretability by focusing on a model's output rather than on its internal state. We consider both raw data and latent-space embeddings when searching for the influence of data items in generated content. We evaluate our method by retraining models locally and by demonstrating the method's ability to uncover influential subsets in the training data. This work lays the groundwork for future extensions, including user-based evaluations with domain experts, which is expected to improve observational interpretability further.

Enhancing Interpretability in Generative AI Through Search-Based Data Influence Analysis

TL;DR

This work tackles the interpretability gap in generative AI by introducing a model-agnostic, data-centric framework that links generated outputs to training data through a two-step, prompt-guided retrieval and comparison process. The method defines a Data Influence function with a textual filter and cosine-based kernel to weight training samples by visual similarity, allowing observational explanations without internal gradient access. It is validated via experiments on a locally trained DALL-E-like model and large-scale web-based evaluations, showing that removing influential samples reduces similarity to reference outputs and that retrieved training samples can reflect copyright-related substances. The approach holds practical value for copyright tracing and dataset transparency, while highlighting the need for accessible training data metadata and future enhancements including domain expert evaluations and richer explainable representations.

Abstract

Generative AI models offer powerful capabilities but often lack transparency, making it difficult to interpret their output. This is critical in cases involving artistic or copyrighted content. This work introduces a search-inspired approach to improve the interpretability of these models by analysing the influence of training data on their outputs. Our method provides observational interpretability by focusing on a model's output rather than on its internal state. We consider both raw data and latent-space embeddings when searching for the influence of data items in generated content. We evaluate our method by retraining models locally and by demonstrating the method's ability to uncover influential subsets in the training data. This work lays the groundwork for future extensions, including user-based evaluations with domain experts, which is expected to improve observational interpretability further.

Paper Structure

This paper contains 12 sections, 6 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Overview of the proposed pipeline. Phase one performs text-based retrieval by comparing the user's prompt with the text descriptions of the training samples. Phase two refines the results by comparing the retrieved samples with the generated output with both textual and visual features.
  • Figure 2: Sample output from the locally trained DALL·E model, to capture key characteristics for detailed analysis.
  • Figure 3: Comparison of Stable Diffusion generated images with DDG-retrieved images, demonstrating the effectiveness of our retrieval-based method.