Table of Contents
Fetching ...

Training Data Attribution for Image Generation using Ontology-Aligned Knowledge Graphs

Theodoros Aivalis, Iraklis A. Klampanos, Antonis Troumpoukis, Joemon M. Jose

TL;DR

The paper tackles transparency and copyright attribution in generative image systems by proposing ontology-guided knowledge graphs (KGs) constructed from images via multimodal LLMs. It introduces an end-to-end pipeline that extracts ontology-aligned semantic triples, stores per-image KGs, and performs graph-based retrieval to trace training-data influence, complemented by unlearning experiments. The authors validate the approach with locally trained fashion data and a stylised Ghibli-domain, showing competitive attribution performance and enhanced interpretability over latent embeddings. The work advances dataset transparency and attribution in generative AI and suggests scalable extensions to broader domains and richer semantic reasoning.

Abstract

As generative models become powerful, concerns around transparency, accountability, and copyright violations have intensified. Understanding how specific training data contributes to a model's output is critical. We introduce a framework for interpreting generative outputs through the automatic construction of ontologyaligned knowledge graphs (KGs). While automatic KG construction from natural text has advanced, extracting structured and ontology-consistent representations from visual content remains challenging -- due to the richness and multi-object nature of images. Leveraging multimodal large language models (LLMs), our method extracts structured triples from images, aligned with a domain-specific ontology. By comparing the KGs of generated and training images, we can trace potential influences, enabling copyright analysis, dataset transparency, and interpretable AI. We validate our method through experiments on locally trained models via unlearning, and on large-scale models through a style-specific experiment. Our framework supports the development of AI systems that foster human collaboration, creativity and stimulate curiosity.

Training Data Attribution for Image Generation using Ontology-Aligned Knowledge Graphs

TL;DR

The paper tackles transparency and copyright attribution in generative image systems by proposing ontology-guided knowledge graphs (KGs) constructed from images via multimodal LLMs. It introduces an end-to-end pipeline that extracts ontology-aligned semantic triples, stores per-image KGs, and performs graph-based retrieval to trace training-data influence, complemented by unlearning experiments. The authors validate the approach with locally trained fashion data and a stylised Ghibli-domain, showing competitive attribution performance and enhanced interpretability over latent embeddings. The work advances dataset transparency and attribution in generative AI and suggests scalable extensions to broader domains and richer semantic reasoning.

Abstract

As generative models become powerful, concerns around transparency, accountability, and copyright violations have intensified. Understanding how specific training data contributes to a model's output is critical. We introduce a framework for interpreting generative outputs through the automatic construction of ontologyaligned knowledge graphs (KGs). While automatic KG construction from natural text has advanced, extracting structured and ontology-consistent representations from visual content remains challenging -- due to the richness and multi-object nature of images. Leveraging multimodal large language models (LLMs), our method extracts structured triples from images, aligned with a domain-specific ontology. By comparing the KGs of generated and training images, we can trace potential influences, enabling copyright analysis, dataset transparency, and interpretable AI. We validate our method through experiments on locally trained models via unlearning, and on large-scale models through a style-specific experiment. Our framework supports the development of AI systems that foster human collaboration, creativity and stimulate curiosity.

Paper Structure

This paper contains 31 sections, 4 figures, 5 tables.

Figures (4)

  • Figure 1: Pipeline overview. A multimodal LLM, prompted with a domain-specific ontology, extracts semantic triples from images. These are stored as per-image KGs and compared via node overlap to rank semantic similarity.
  • Figure 2: Example output of our system, which extracts semantic triples and constructs a KG from a fashion image. On the left, we show the prompt used to instruct the Multimodal LLM. On the right, the extracted knowledge is visualised as a graph, here, star-shaped due to the dataset showing a single garment per image.
  • Figure 3: Comparison of similarity methods. Both identify the images as highly similar (baseline score: 0.82). The KG-based method also explains why by revealing shared attributes between the two images. Green edges indicate relationships whose attributes are identical across both graphs.
  • Figure 4: Experimental pipeline overview. The left illustrates the "Ghibli World" from movies. On the right, we can find the $\Delta_{\text{style}}$. Blue nodes represent semantic characteristics.