KGQuest: Template-Driven QA Generation from Knowledge Graphs with LLM-Based Refinement
Sania Nayab, Marco Simoni, Giulio Rossolini, Andrea Saracino
TL;DR
KGQuest tackles scalable QA generation from knowledge graphs by combining a deterministic, template-driven pipeline with a lightweight, LLM-based refinement stage. Triplets are clustered by relation to produce reusable templates, which are instantiated with subject objects and augmented with KG-derived distractors; an optional per-template refinement with small LLMs improves fluency while preserving factual content. Evaluations across Wikigraphs, WebQSP, and CWQ show 80–90% correctness for templated questions, with refinement reducing linguistic errors and yielding substantial efficiency gains over direct, triplet-wide LLM generation. The approach offers a transparent, scalable framework for cross-domain KG QA generation with practical implications for education, benchmarking, and LLM evaluation, and points toward extensions like difficulty-aware distractors and broader domain generalization.
Abstract
The generation of questions and answers (QA) from knowledge graphs (KG) plays a crucial role in the development and testing of educational platforms, dissemination tools, and large language models (LLM). However, existing approaches often struggle with scalability, linguistic quality, and factual consistency. This paper presents a scalable and deterministic pipeline for generating natural language QA from KGs, with an additional refinement step using LLMs to further enhance linguistic quality. The approach first clusters KG triplets based on their relations, creating reusable templates through natural language rules derived from the entity types of objects and relations. A module then leverages LLMs to refine these templates, improving clarity and coherence while preserving factual accuracy. Finally, the instantiation of answer options is achieved through a selection strategy that introduces distractors from the KG. Our experiments demonstrate that this hybrid approach efficiently generates high-quality QA pairs, combining scalability with fluency and linguistic precision.
