Table of Contents
Fetching ...

Enhancing Text-based Knowledge Graph Completion with Zero-Shot Large Language Models: A Focus on Semantic Enhancement

Rui Yang, Jiahao Zhu, Jianping Man, Li Fang, Yi Zhou

TL;DR

This work tackles text-based knowledge graph completion by addressing semantic completeness through CP-KGC, a framework of dataset-adaptive constrained prompts. It introduces semantic integrity discrimination, semantic compression, and semantic expansion to maintain rich semantics within model input limits, guided by contextual constraints to resolve polysemy. Across FB15k-237, WN18RR, and UMLS, CP-KGC improves text-based KGC performance and remains effective with quantized LLMs like Qwen-7B-Chat-int4, demonstrating practical viability and scalability. The findings highlight how carefully designed prompts and context-aware strategies can significantly boost semantic quality and predictive accuracy in KGC tasks, informing future KG–LLM integration efforts.

Abstract

The design and development of text-based knowledge graph completion (KGC) methods leveraging textual entity descriptions are at the forefront of research. These methods involve advanced optimization techniques such as soft prompts and contrastive learning to enhance KGC models. The effectiveness of text-based methods largely hinges on the quality and richness of the training data. Large language models (LLMs) can utilize straightforward prompts to alter text data, thereby enabling data augmentation for KGC. Nevertheless, LLMs typically demand substantial computational resources. To address these issues, we introduce a framework termed constrained prompts for KGC (CP-KGC). This CP-KGC framework designs prompts that adapt to different datasets to enhance semantic richness. Additionally, CP-KGC employs a context constraint strategy to effectively identify polysemous entities within KGC datasets. Through extensive experimentation, we have verified the effectiveness of this framework. Even after quantization, the LLM (Qwen-7B-Chat-int4) still enhances the performance of text-based KGC methods \footnote{Code and datasets are available at \href{https://github.com/sjlmg/CP-KGC}{https://github.com/sjlmg/CP-KGC}}. This study extends the performance limits of existing models and promotes further integration of KGC with LLMs.

Enhancing Text-based Knowledge Graph Completion with Zero-Shot Large Language Models: A Focus on Semantic Enhancement

TL;DR

This work tackles text-based knowledge graph completion by addressing semantic completeness through CP-KGC, a framework of dataset-adaptive constrained prompts. It introduces semantic integrity discrimination, semantic compression, and semantic expansion to maintain rich semantics within model input limits, guided by contextual constraints to resolve polysemy. Across FB15k-237, WN18RR, and UMLS, CP-KGC improves text-based KGC performance and remains effective with quantized LLMs like Qwen-7B-Chat-int4, demonstrating practical viability and scalability. The findings highlight how carefully designed prompts and context-aware strategies can significantly boost semantic quality and predictive accuracy in KGC tasks, informing future KG–LLM integration efforts.

Abstract

The design and development of text-based knowledge graph completion (KGC) methods leveraging textual entity descriptions are at the forefront of research. These methods involve advanced optimization techniques such as soft prompts and contrastive learning to enhance KGC models. The effectiveness of text-based methods largely hinges on the quality and richness of the training data. Large language models (LLMs) can utilize straightforward prompts to alter text data, thereby enabling data augmentation for KGC. Nevertheless, LLMs typically demand substantial computational resources. To address these issues, we introduce a framework termed constrained prompts for KGC (CP-KGC). This CP-KGC framework designs prompts that adapt to different datasets to enhance semantic richness. Additionally, CP-KGC employs a context constraint strategy to effectively identify polysemous entities within KGC datasets. Through extensive experimentation, we have verified the effectiveness of this framework. Even after quantization, the LLM (Qwen-7B-Chat-int4) still enhances the performance of text-based KGC methods \footnote{Code and datasets are available at \href{https://github.com/sjlmg/CP-KGC}{https://github.com/sjlmg/CP-KGC}}. This study extends the performance limits of existing models and promotes further integration of KGC with LLMs.
Paper Structure (24 sections, 6 equations, 5 figures, 7 tables)

This paper contains 24 sections, 6 equations, 5 figures, 7 tables.

Figures (5)

  • Figure 1: LLMs can add or remove content from entity descriptions.
  • Figure 2: CP-KGC semantic enhancement framework.
  • Figure 3: Comparing the impact of different maximum truncation lengths on model performance between CP-KGC and SimKGC.
  • Figure 4: The impact of using context on the WN18RR dataset is illustrated as follows: the yellow section indicates that the predicted part-of-speech usage closely resembles typical usage patterns, whereas the green section shows redundancy with the original entity descriptions.
  • Figure 5: Regarding the FB15k-237 dataset, the influence of context is demonstrated by the yellow section, which highlights fine-grained words introduced by the context, and the green section, which indicates areas where predictions overlap with existing data.