KG-LLM-Bench: A Scalable Benchmark for Evaluating LLM Reasoning on Textualized Knowledge Graphs

Elan Markowitz; Krupa Galiya; Greg Ver Steeg; Aram Galstyan

KG-LLM-Bench: A Scalable Benchmark for Evaluating LLM Reasoning on Textualized Knowledge Graphs

Elan Markowitz, Krupa Galiya, Greg Ver Steeg, Aram Galstyan

TL;DR

KG-LLM-Bench introduces a scalable, extensible benchmark to evaluate how LLMs reason over textualized knowledge graphs across five tasks. It systematically compares five KG textualization formats, seven LLMs, and a pseudonymization regime, revealing that encoding choices significantly influence performance and token efficiency. The framework combines subgraph sampling, deterministic question generation, and exact-match scoring to provide actionable insights into how to optimize KG reasoning in practice. The results offer practical guidance for designing knowledge-augmented LLM systems and highlight directions for future research in scalable KG reasoning and test-time inference.

Abstract

Knowledge graphs have emerged as a popular method for injecting up-to-date, factual knowledge into large language models (LLMs). This is typically achieved by converting the knowledge graph into text that the LLM can process in context. While multiple methods of encoding knowledge graphs have been proposed, the impact of this textualization process on LLM performance remains under-explored. We introduce KG-LLM-Bench, a comprehensive and extensible benchmark spanning five knowledge graph understanding tasks, and evaluate how different encoding strategies affect performance across various base models. Our extensive experiments with seven language models and five textualization strategies provide insights for optimizing LLM performance on KG reasoning tasks.

KG-LLM-Bench: A Scalable Benchmark for Evaluating LLM Reasoning on Textualized Knowledge Graphs

TL;DR

Abstract

KG-LLM-Bench: A Scalable Benchmark for Evaluating LLM Reasoning on Textualized Knowledge Graphs

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (12)