Finetuning Generative Large Language Models with Discrimination Instructions for Knowledge Graph Completion
Yang Liu, Xiaobin Tian, Zequn Sun, Wei Hu
TL;DR
This work tackles knowledge graph completion with large language models while avoiding grounding errors by finetuning an open LLM using discrimination instructions. It combines candidate entities from a lightweight embedding model, truncated sampling to reduce data while preserving informative examples, and KG embeddings injected into the LLM to boost graph reasoning. The proposed DIFT framework achieves state-of-the-art results on FB15K-237 and WN18RR, outperforming both embedding-based and generation-based baselines while remaining computation-efficient via LoRA/QLoRA and candidate-based prompting. These findings demonstrate that discrimination-informed finetuning can unlock robust KG reasoning in LLMs with practical efficiency, guiding future work on KGQA and entity alignment.
Abstract
Traditional knowledge graph (KG) completion models learn embeddings to predict missing facts. Recent works attempt to complete KGs in a text-generation manner with large language models (LLMs). However, they need to ground the output of LLMs to KG entities, which inevitably brings errors. In this paper, we present a finetuning framework, DIFT, aiming to unleash the KG completion ability of LLMs and avoid grounding errors. Given an incomplete fact, DIFT employs a lightweight model to obtain candidate entities and finetunes an LLM with discrimination instructions to select the correct one from the given candidates. To improve performance while reducing instruction data, DIFT uses a truncated sampling method to select useful facts for finetuning and injects KG embeddings into the LLM. Extensive experiments on benchmark datasets demonstrate the effectiveness of our proposed framework.
