Table of Contents
Fetching ...

Retrieval-Augmented Language Model for Extreme Multi-Label Knowledge Graph Link Prediction

Yu-Hsiang Lin, Huang-Ting Shieh, Chih-Yu Liu, Kuang-Ting Lee, Hsiao-Cheng Chang, Jing-Lun Yang, Yu-Sheng Lin

TL;DR

This work tackles hallucination and high fine-tuning costs in LLMs by introducing a retrieval-augmented framework for extreme multi-label knowledge graph link prediction. It reformulates KG link prediction as a multi-label task and leverages a four-stage retrieval pipeline that combines concise entity descriptions and carefully selected one-hop neighbors via a SimCSE-based retriever, feeding augmented input to a fine-tuned BERT encoder. A three-stage loss schedule with a precision term addresses the challenges of high-dimensional label spaces, yielding superior $P@k$ performance on WN18RR and FB15k-237 compared with XMTC baselines. The results demonstrate that KG-specific augmentation strategies, particularly the inclusion of textual descriptions and targeted one-hop information, significantly improve extrapolation capabilities with a relatively small parameter footprint.

Abstract

Extrapolation in Large language models (LLMs) for open-ended inquiry encounters two pivotal issues: (1) hallucination and (2) expensive training costs. These issues present challenges for LLMs in specialized domains and personalized data, requiring truthful responses and low fine-tuning costs. Existing works attempt to tackle the problem by augmenting the input of a smaller language model with information from a knowledge graph (KG). However, they have two limitations: (1) failing to extract relevant information from a large one-hop neighborhood in KG and (2) applying the same augmentation strategy for KGs with different characteristics that may result in low performance. Moreover, open-ended inquiry typically yields multiple responses, further complicating extrapolation. We propose a new task, the extreme multi-label KG link prediction task, to enable a model to perform extrapolation with multiple responses using structured real-world knowledge. Our retriever identifies relevant one-hop neighbors by considering entity, relation, and textual data together. Our experiments demonstrate that (1) KGs with different characteristics require different augmenting strategies, and (2) augmenting the language model's input with textual data improves task performance significantly. By incorporating the retrieval-augmented framework with KG, our framework, with a small parameter size, is able to extrapolate based on a given KG. The code can be obtained on GitHub: https://github.com/exiled1143/Retrieval-Augmented-Language-Model-for-Multi-Label-Knowledge-Graph-Link-Prediction.git

Retrieval-Augmented Language Model for Extreme Multi-Label Knowledge Graph Link Prediction

TL;DR

This work tackles hallucination and high fine-tuning costs in LLMs by introducing a retrieval-augmented framework for extreme multi-label knowledge graph link prediction. It reformulates KG link prediction as a multi-label task and leverages a four-stage retrieval pipeline that combines concise entity descriptions and carefully selected one-hop neighbors via a SimCSE-based retriever, feeding augmented input to a fine-tuned BERT encoder. A three-stage loss schedule with a precision term addresses the challenges of high-dimensional label spaces, yielding superior performance on WN18RR and FB15k-237 compared with XMTC baselines. The results demonstrate that KG-specific augmentation strategies, particularly the inclusion of textual descriptions and targeted one-hop information, significantly improve extrapolation capabilities with a relatively small parameter footprint.

Abstract

Extrapolation in Large language models (LLMs) for open-ended inquiry encounters two pivotal issues: (1) hallucination and (2) expensive training costs. These issues present challenges for LLMs in specialized domains and personalized data, requiring truthful responses and low fine-tuning costs. Existing works attempt to tackle the problem by augmenting the input of a smaller language model with information from a knowledge graph (KG). However, they have two limitations: (1) failing to extract relevant information from a large one-hop neighborhood in KG and (2) applying the same augmentation strategy for KGs with different characteristics that may result in low performance. Moreover, open-ended inquiry typically yields multiple responses, further complicating extrapolation. We propose a new task, the extreme multi-label KG link prediction task, to enable a model to perform extrapolation with multiple responses using structured real-world knowledge. Our retriever identifies relevant one-hop neighbors by considering entity, relation, and textual data together. Our experiments demonstrate that (1) KGs with different characteristics require different augmenting strategies, and (2) augmenting the language model's input with textual data improves task performance significantly. By incorporating the retrieval-augmented framework with KG, our framework, with a small parameter size, is able to extrapolate based on a given KG. The code can be obtained on GitHub: https://github.com/exiled1143/Retrieval-Augmented-Language-Model-for-Multi-Label-Knowledge-Graph-Link-Prediction.git
Paper Structure (16 sections, 10 equations, 2 figures, 8 tables)

This paper contains 16 sections, 10 equations, 2 figures, 8 tables.

Figures (2)

  • Figure 1: An Illustration of the Extreme Multi-label Knowledge Graph Link Prediction Task. Each $h_i$, $i \in {1, 2, 3}$, denotes the head entity of a triple, and $t$ denotes a tail entity. Consider three triples with the same relation and $t$. This can be reformulated into a multi-label problem by giving relation and tail as input raw text while the $h_{1}$, $h_{2}$, and $h_{3}$ are the corresponding labels.
  • Figure 2: An Illustration of the Proposed Framework. $h$: head entity, $r$: relation, $t$: tail entity, $[CLS]$: [CLS] token, $[MASK]$: [MASK] token, $d$: description of a given node in an incomplete triple, $D$: description of an one-hop neighbor, $d_{h}$: description of the given head entity in an incomplete triple, $D_{h}$: description of the given head entity in an one-hop neighbor, $D_{t}$: description of the given tail entity in an one-hop neighbor, and $\|$: concatenate. This figure shows the format of data at each stage of our framework. Suppose given the input triple$\coloneqq h\|r\|[MASK]$, our framework will give out triple with 1-hop$\coloneqq [CLS]\|h\|d_{h}\|r\|[MASK]\|[SEP]$$\|$filtered 1-hop as the input raw text of the BERT model.