Table of Contents
Fetching ...

Knowledge Graph-Enhanced Large Language Models via Path Selection

Haochen Liu, Song Wang, Yaochen Zhu, Yushun Dong, Jundong Li

TL;DR

This work defines $G=(\mathcal{E},\mathcal{R},\mathcal{T})$ and addresses factual inaccuracies in LLM outputs by augmenting prompts with knowledge paths extracted from external KGs. It introduces KELP, a three-stage framework comprising Knowledge path extraction, Sample encoding, and Fine-grained path selection, augmented by a latent semantic path-text encoder and two coverage rules to capture both direct and indirect semantics; an optional Relation-Only Ranking scales the approach to very large KGs. A pairwise training objective optimizes the path-text encoder to align with potentially impactful knowledge, and extensive experiments on MetaQA and FACTKG show KELP surpassing LLM-based evidence baselines and approaching fully supervised baselines in many settings. The approach demonstrates practical improvements in factual accuracy with robust performance in few-shot regimes, offering a scalable, flexible tool for KG-Enhanced LLMs with real-world applicability in QA and fact verification.

Abstract

Large Language Models (LLMs) have shown unprecedented performance in various real-world applications. However, they are known to generate factually inaccurate outputs, a.k.a. the hallucination problem. In recent years, incorporating external knowledge extracted from Knowledge Graphs (KGs) has become a promising strategy to improve the factual accuracy of LLM-generated outputs. Nevertheless, most existing explorations rely on LLMs themselves to perform KG knowledge extraction, which is highly inflexible as LLMs can only provide binary judgment on whether a certain knowledge (e.g., a knowledge path in KG) should be used. In addition, LLMs tend to pick only knowledge with direct semantic relationship with the input text, while potentially useful knowledge with indirect semantics can be ignored. In this work, we propose a principled framework KELP with three stages to handle the above problems. Specifically, KELP is able to achieve finer granularity of flexible knowledge extraction by generating scores for knowledge paths with input texts via latent semantic matching. Meanwhile, knowledge paths with indirect semantic relationships with the input text can also be considered via trained encoding between the selected paths in KG and the input text. Experiments on real-world datasets validate the effectiveness of KELP.

Knowledge Graph-Enhanced Large Language Models via Path Selection

TL;DR

This work defines and addresses factual inaccuracies in LLM outputs by augmenting prompts with knowledge paths extracted from external KGs. It introduces KELP, a three-stage framework comprising Knowledge path extraction, Sample encoding, and Fine-grained path selection, augmented by a latent semantic path-text encoder and two coverage rules to capture both direct and indirect semantics; an optional Relation-Only Ranking scales the approach to very large KGs. A pairwise training objective optimizes the path-text encoder to align with potentially impactful knowledge, and extensive experiments on MetaQA and FACTKG show KELP surpassing LLM-based evidence baselines and approaching fully supervised baselines in many settings. The approach demonstrates practical improvements in factual accuracy with robust performance in few-shot regimes, offering a scalable, flexible tool for KG-Enhanced LLMs with real-world applicability in QA and fact verification.

Abstract

Large Language Models (LLMs) have shown unprecedented performance in various real-world applications. However, they are known to generate factually inaccurate outputs, a.k.a. the hallucination problem. In recent years, incorporating external knowledge extracted from Knowledge Graphs (KGs) has become a promising strategy to improve the factual accuracy of LLM-generated outputs. Nevertheless, most existing explorations rely on LLMs themselves to perform KG knowledge extraction, which is highly inflexible as LLMs can only provide binary judgment on whether a certain knowledge (e.g., a knowledge path in KG) should be used. In addition, LLMs tend to pick only knowledge with direct semantic relationship with the input text, while potentially useful knowledge with indirect semantics can be ignored. In this work, we propose a principled framework KELP with three stages to handle the above problems. Specifically, KELP is able to achieve finer granularity of flexible knowledge extraction by generating scores for knowledge paths with input texts via latent semantic matching. Meanwhile, knowledge paths with indirect semantic relationships with the input text can also be considered via trained encoding between the selected paths in KG and the input text. Experiments on real-world datasets validate the effectiveness of KELP.
Paper Structure (23 sections, 8 equations, 3 figures, 3 tables)

This paper contains 23 sections, 8 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: An example of the phenomenon that semantically unrelated contexts in the input prompts can possibly contain important knowledge to correct/improve the generation of large language models. In this example, there exist potential relationships between "Japan" and "Akita Inu" that are challenging to directly identify and capture.
  • Figure 2: The overall pipeline of the proposed KELP. During the inference phase, we identify knowledge paths from the knowledge graph that are associated with the entities present in the input question. An encoder is then trained to select valuable paths as knowledge contexts. Finally, the selected knowledge contexts, along with the input question, are input into the LLM to generate the final answer.
  • Figure 3: Comparison between the baseline GPT no-context, KG-GPT (LLM-based evidence), and our proposed method KELP on the FACTKG dataset and MetaQA dataset w.r.t. different shots in the learning setting.