Table of Contents
Fetching ...

RECIPE-TKG: From Sparse History to Structured Reasoning for LLM-based Temporal Knowledge Graph Completion

Ömer Faruk Akgül, Feiyu Zhu, Yuxin Yang, Rajgopal Kannan, Viktor Prasanna

TL;DR

RECIPE-TKG tackles the challenge of forecasting in temporal knowledge graphs with sparse historical evidence by integrating three targeted components: rule-based multi-hop history sampling to enrich grounding, contrastive fine-tuning of lightweight LoRA adapters to encode relational semantics, and test-time semantic filtering to enforce contextual consistency. This framework yields stronger relational grounding and reduces hallucinations, especially in low-context queries, outperforming both embedding-based and prior LL-based methods across four benchmarks with up to $30.6\%$ relative gains in $Hits@10$. By systematically analyzing grounding, generalization, and evaluation, the work demonstrates that carefully designed retrieval, training objectives, and inference-time checks can significantly boost the reliability and plausibility of LLM-based TKG forecasters without large-scale retraining. The approach advances practical structured reasoning in foundation models for temporally dynamic knowledge, with implications for forecasting and decision support in domains where historical evidence is sparse or indirect.

Abstract

Temporal Knowledge Graphs (TKGs) represent dynamic facts as timestamped relations between entities. TKG completion involves forecasting missing or future links, requiring models to reason over time-evolving structure. While LLMs show promise for this task, existing approaches often overemphasize supervised fine-tuning and struggle particularly when historical evidence is limited or missing. We introduce RECIPE-TKG, a lightweight and data-efficient framework designed to improve accuracy and generalization in settings with sparse historical context. It combines (1) rule-based multi-hop retrieval for structurally diverse history, (2) contrastive fine-tuning of lightweight adapters to encode relational semantics, and (3) test-time semantic filtering to iteratively refine generations based on embedding similarity. Experiments on four TKG benchmarks show that RECIPE-TKG outperforms previous LLM-based approaches, achieving up to 30.6\% relative improvement in Hits@10. Moreover, our proposed framework produces more semantically coherent predictions, even for the samples with limited historical context.

RECIPE-TKG: From Sparse History to Structured Reasoning for LLM-based Temporal Knowledge Graph Completion

TL;DR

RECIPE-TKG tackles the challenge of forecasting in temporal knowledge graphs with sparse historical evidence by integrating three targeted components: rule-based multi-hop history sampling to enrich grounding, contrastive fine-tuning of lightweight LoRA adapters to encode relational semantics, and test-time semantic filtering to enforce contextual consistency. This framework yields stronger relational grounding and reduces hallucinations, especially in low-context queries, outperforming both embedding-based and prior LL-based methods across four benchmarks with up to relative gains in . By systematically analyzing grounding, generalization, and evaluation, the work demonstrates that carefully designed retrieval, training objectives, and inference-time checks can significantly boost the reliability and plausibility of LLM-based TKG forecasters without large-scale retraining. The approach advances practical structured reasoning in foundation models for temporally dynamic knowledge, with implications for forecasting and decision support in domains where historical evidence is sparse or indirect.

Abstract

Temporal Knowledge Graphs (TKGs) represent dynamic facts as timestamped relations between entities. TKG completion involves forecasting missing or future links, requiring models to reason over time-evolving structure. While LLMs show promise for this task, existing approaches often overemphasize supervised fine-tuning and struggle particularly when historical evidence is limited or missing. We introduce RECIPE-TKG, a lightweight and data-efficient framework designed to improve accuracy and generalization in settings with sparse historical context. It combines (1) rule-based multi-hop retrieval for structurally diverse history, (2) contrastive fine-tuning of lightweight adapters to encode relational semantics, and (3) test-time semantic filtering to iteratively refine generations based on embedding similarity. Experiments on four TKG benchmarks show that RECIPE-TKG outperforms previous LLM-based approaches, achieving up to 30.6\% relative improvement in Hits@10. Moreover, our proposed framework produces more semantically coherent predictions, even for the samples with limited historical context.

Paper Structure

This paper contains 64 sections, 18 equations, 15 figures, 8 tables, 2 algorithms.

Figures (15)

  • Figure 1: Example of LLM-based TKG reasoning. Prior methods rely on 1-hop historical context, leading to memorization or hallucination. RECIPE-TKG incorporates richer structural and relational context by sampling and filtering, enabling more plausible predictions.
  • Figure 2: Prediction failures under sparse or shallow history. (a) Accuracy vs. history length shows longer contexts support better reasoning. (b) Most non-historical targets require multi-hop reasoning, but are unreachable with 1-hop sampling. (c) Accuracy drops sharply on non-historical predictions for both ICL and SFT.
  • Figure 3: Overview of RECIPE-TKG. RECIPE-TKG follows a three-stage framework: (1) History Sampling, which retrieves query-relevant facts via a two-phase strategy combining rule-based retrieval and context-guided expansion; (2) Contrastive Learning, which jointly optimizes entity embeddings using contrastive and cross-entropy losses. Positive/negative pairs are sampled from the subgraph, and embeddings are generated via a learnable encoder; (3) Test-time Filtering, where predicted entities are iteratively verified by a semantic filter. Unsatisfactory outputs are refined using a statistical generator until confident predictions are obtained.
  • Figure 4: Distribution of semantic similarity values for correctly and incorrectly classified samples to the input context.
  • Figure 5: Hits@10 grouped by number of historical facts. RECIPE-TKG consistently outperforms ICL and GenTKG across all history lengths, with particularly strong improvements when the input history is sparse.
  • ...and 10 more figures