Table of Contents
Fetching ...

Multi-modal Knowledge Graph Generation with Semantics-enriched Prompts

Yajing Xu, Zhiqiang Liu, Jiaoyan Chen, Mingchen Tu, Zhuo Chen, Jeff Z. Pan, Yichi Zhang, Yushan Zhu, Wen Zhang, Huajun Chen

TL;DR

The paper tackles the challenge of enriching conventional knowledge graphs with high-quality, contextually relevant images by proposing a VSNS-driven pipeline that jointly selects visualizable and structurally informative neighbors, generates semantics-enriched prompts via an LLM, and synthesizes images with a diffusion model. The VISUALIZABLE AND STRUCTURAL NEIGHBOR SELECTION (VSNS) framework, comprising Visualizable Neighbor Selection (VNS) and Structural Neighbor Selection (SNS), is combined with prompted image generation and diffusion-based synthesis to create MMKGs-A from KGs. Thorough evaluations on MKG-Y and DB15K demonstrate improvements in image quality (lower FID, higher CLIPscore) and stronger alignment with KG content, as well as positive downstream effects on multimodal knowledge graph completion (MMKGC). The results support the viability of automated, neighbor-informed prompt generation and diffusion-based image synthesis for scalable MMKG construction, with future work addressing abstract entities and broader downstream tasks.

Abstract

Multi-modal Knowledge Graphs (MMKGs) have been widely applied across various domains for knowledge representation. However, the existing MMKGs are significantly fewer than required, and their construction faces numerous challenges, particularly in ensuring the selection of high-quality, contextually relevant images for knowledge graph enrichment. To address these challenges, we present a framework for constructing MMKGs from conventional KGs. Furthermore, to generate higher-quality images that are more relevant to the context in the given knowledge graph, we designed a neighbor selection method called Visualizable Structural Neighbor Selection (VSNS). This method consists of two modules: Visualizable Neighbor Selection (VNS) and Structural Neighbor Selection (SNS). The VNS module filters relations that are difficult to visualize, while the SNS module selects neighbors that most effectively capture the structural characteristics of the entity. To evaluate the quality of the generated images, we performed qualitative and quantitative evaluations on two datasets, MKG-Y and DB15K. The experimental results indicate that using the VSNS method to select neighbors results in higher-quality images that are more relevant to the knowledge graph.

Multi-modal Knowledge Graph Generation with Semantics-enriched Prompts

TL;DR

The paper tackles the challenge of enriching conventional knowledge graphs with high-quality, contextually relevant images by proposing a VSNS-driven pipeline that jointly selects visualizable and structurally informative neighbors, generates semantics-enriched prompts via an LLM, and synthesizes images with a diffusion model. The VISUALIZABLE AND STRUCTURAL NEIGHBOR SELECTION (VSNS) framework, comprising Visualizable Neighbor Selection (VNS) and Structural Neighbor Selection (SNS), is combined with prompted image generation and diffusion-based synthesis to create MMKGs-A from KGs. Thorough evaluations on MKG-Y and DB15K demonstrate improvements in image quality (lower FID, higher CLIPscore) and stronger alignment with KG content, as well as positive downstream effects on multimodal knowledge graph completion (MMKGC). The results support the viability of automated, neighbor-informed prompt generation and diffusion-based image synthesis for scalable MMKG construction, with future work addressing abstract entities and broader downstream tasks.

Abstract

Multi-modal Knowledge Graphs (MMKGs) have been widely applied across various domains for knowledge representation. However, the existing MMKGs are significantly fewer than required, and their construction faces numerous challenges, particularly in ensuring the selection of high-quality, contextually relevant images for knowledge graph enrichment. To address these challenges, we present a framework for constructing MMKGs from conventional KGs. Furthermore, to generate higher-quality images that are more relevant to the context in the given knowledge graph, we designed a neighbor selection method called Visualizable Structural Neighbor Selection (VSNS). This method consists of two modules: Visualizable Neighbor Selection (VNS) and Structural Neighbor Selection (SNS). The VNS module filters relations that are difficult to visualize, while the SNS module selects neighbors that most effectively capture the structural characteristics of the entity. To evaluate the quality of the generated images, we performed qualitative and quantitative evaluations on two datasets, MKG-Y and DB15K. The experimental results indicate that using the VSNS method to select neighbors results in higher-quality images that are more relevant to the knowledge graph.

Paper Structure

This paper contains 32 sections, 5 equations, 5 figures, 6 tables.

Figures (5)

  • Figure 1: Illustration of multi-modal knowledge graphs
  • Figure 2: The framework of translating KGs into MMKGs.
  • Figure 3: Examples of distilling knowledge from LLM
  • Figure 4: Example of $\text{I}_m$ with lower CIE.
  • Figure 5: (a): example of $\text{I}_s$ with lower CIE. (b): examples of generated images of landscapes or places. (c): example of $\text{I}_{svns}$ or $\text{I}_m$ with lower IQ.