Table of Contents
Fetching ...

Retrieval-Augmented Generation Meets Data-Driven Tabula Rasa Approach for Temporal Knowledge Graph Forecasting

Geethan Sannidhi, Sagar Srinivas Sakhinana, Venkataramana Runkana

TL;DR

The paper tackles trustworthy temporal knowledge graph forecasting in the face of hallucinations and data leakage by proposing sLA-tKGF, a retrieval-augmented framework that trains a small-scale language model from scratch (tabula rasa). Predictions are grounded via knowledge-infused prompts that integrate historical tKG data, web-sourced context, and PLLMs-generated historical summaries. Through extensive experiments on ICEWS, WIKI, YAGO, and ACLED datasets, the approach achieves state-of-the-art performance, with ablations confirming the critical roles of historical knowledge retrieval, web context, and PLLMs-derived descriptions. The work demonstrates the practical viability of scalable, interpretable tKG forecasting with reduced bias and improved traceability, suitable for real-world applications requiring reliable temporal reasoning.

Abstract

Pre-trained large language models (PLLMs) like OpenAI ChatGPT and Google Gemini face challenges such as inaccurate factual recall, hallucinations, biases, and future data leakage for temporal Knowledge Graph (tKG) forecasting. To address these issues, we introduce sLA-tKGF (small-scale language assistant for tKG forecasting), which utilizes Retrieval-Augmented Generation (RAG) aided, custom-trained small-scale language models through a tabula rasa approach from scratch for effective tKG forecasting. Our framework constructs knowledge-infused prompts with relevant historical data from tKGs, web search results, and PLLMs-generated textual descriptions to understand historical entity relationships prior to the target time. It leverages these external knowledge-infused prompts for deeper understanding and reasoning of context-specific semantic and temporal information to zero-shot prompt small-scale language models for more accurate predictions of future events within tKGs. It reduces hallucinations and mitigates distributional shift challenges through comprehending changing trends over time. As a result, it enables more accurate and contextually grounded forecasts of future events while minimizing computational demands. Rigorous empirical studies demonstrate our framework robustness, scalability, and state-of-the-art (SOTA) performance on benchmark datasets with interpretable and trustworthy tKG forecasting.

Retrieval-Augmented Generation Meets Data-Driven Tabula Rasa Approach for Temporal Knowledge Graph Forecasting

TL;DR

The paper tackles trustworthy temporal knowledge graph forecasting in the face of hallucinations and data leakage by proposing sLA-tKGF, a retrieval-augmented framework that trains a small-scale language model from scratch (tabula rasa). Predictions are grounded via knowledge-infused prompts that integrate historical tKG data, web-sourced context, and PLLMs-generated historical summaries. Through extensive experiments on ICEWS, WIKI, YAGO, and ACLED datasets, the approach achieves state-of-the-art performance, with ablations confirming the critical roles of historical knowledge retrieval, web context, and PLLMs-derived descriptions. The work demonstrates the practical viability of scalable, interpretable tKG forecasting with reduced bias and improved traceability, suitable for real-world applications requiring reliable temporal reasoning.

Abstract

Pre-trained large language models (PLLMs) like OpenAI ChatGPT and Google Gemini face challenges such as inaccurate factual recall, hallucinations, biases, and future data leakage for temporal Knowledge Graph (tKG) forecasting. To address these issues, we introduce sLA-tKGF (small-scale language assistant for tKG forecasting), which utilizes Retrieval-Augmented Generation (RAG) aided, custom-trained small-scale language models through a tabula rasa approach from scratch for effective tKG forecasting. Our framework constructs knowledge-infused prompts with relevant historical data from tKGs, web search results, and PLLMs-generated textual descriptions to understand historical entity relationships prior to the target time. It leverages these external knowledge-infused prompts for deeper understanding and reasoning of context-specific semantic and temporal information to zero-shot prompt small-scale language models for more accurate predictions of future events within tKGs. It reduces hallucinations and mitigates distributional shift challenges through comprehending changing trends over time. As a result, it enables more accurate and contextually grounded forecasts of future events while minimizing computational demands. Rigorous empirical studies demonstrate our framework robustness, scalability, and state-of-the-art (SOTA) performance on benchmark datasets with interpretable and trustworthy tKG forecasting.
Paper Structure (28 sections, 2 equations, 1 figure, 19 tables)

This paper contains 28 sections, 2 equations, 1 figure, 19 tables.

Figures (1)

  • Figure 1: The sLA-tKGF framework combines information retrieval from web scraping, historical knowledge from tKGs, and querying PLLMs to generate descriptions based on historical entity relationships to construct knowledge-augmented prompts for small-scale language models to achieve high factual accuracy and reliability in tKG forecasting.