Learning to Refine: An Agentic RL Approach for Iterative SPARQL Query Construction

Floris Vossebeld; Shenghui Wang

Learning to Refine: An Agentic RL Approach for Iterative SPARQL Query Construction

Floris Vossebeld, Shenghui Wang

TL;DR

This work tackles multi-hop KGQA by reframing SPARQL construction as an iterative, agentic decision process. A compact LLM is fine-tuned with Group Relative Policy Optimization to learn a think–act–observe policy that improves query refinement through execution feedback, without supervised demonstrations. On a curated LC-QuAD 2.0 subset, the RL-tuned agent achieves strong gains in accuracy and query executability, and ablations show that deliberate reasoning provides a meaningful boost. The approach demonstrates how interaction with a symbolic knowledge graph can bridge probabilistic LLM reasoning and structured data, offering a generalizable blueprint for agentic, tool-using reasoning in KGQA and related symbolic tasks.

Abstract

Generating complex, logically-sound SPARQL queries for multi-hop questions remains a critical bottleneck for Knowledge Graph Question Answering, as the brittle nature of one-shot generation by Large Language Models (LLMs) hinders reliable interaction with structured data. Current methods lack the adaptive policies needed to dynamically debug queries based on real-time execution feedback. This paper introduces a novel agentic framework where an LLM learns a resilient policy for the sequential process of iterative SPARQL construction. We show that a compact 3B-parameter model, trained exclusively via outcome-driven Reinforcement Learning (GRPO) without supervised fine-tuning, can learn effective policies for this task, discovering how to systematically recover from execution errors and refine its queries toward a correct answer. On a curated, executable single-answer subset of LC-QuAD 2.0, our agent achieves 49.7\% accuracy post-entity-linking, a significant 17.5 percentage point improvement over the strongest iterative zero-shot baseline. Further analysis reveals that while the agent's capability is driven by RL, its performance is enhanced by an explicit deliberative reasoning step that acts as a cognitive scaffold to improve policy precision. This work presents a generalizable blueprint for teaching agents to master formal, symbolic tools through interaction, bridging the gap between probabilistic LLMs and the structured world of Knowledge Graphs.

Learning to Refine: An Agentic RL Approach for Iterative SPARQL Query Construction

TL;DR

Abstract

Learning to Refine: An Agentic RL Approach for Iterative SPARQL Query Construction

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (4)