Table of Contents
Fetching ...

QueryAgent: A Reliable and Efficient Reasoning Framework with Environmental Feedback-based Self-Correction

Xiang Huang, Sitao Cheng, Shanshan Huang, Jiayu Shen, Yong Xu, Chaoyun Zhang, Yuzhong Qu

TL;DR

This work tackles reliability and efficiency gaps in LLM-driven KBQA by introducing QueryAgent, a stepwise, tool-assisted reasoning framework, and ERASER, an environmental feedback-based self-correction mechanism. By leveraging PyQL-based query construction and targeted corrections guided by KB, Python, and reasoning-memory feedback, the approach achieves state-of-the-art results with a single example and substantially reduces runtime and API costs compared to prior methods. Across GrailQA, GraphQ, WebQSP, and MetaQA-3Hop, QueryAgent outperforms 100-shot baselines, and its transfer to WikiSQL demonstrates versatility to new semantic parsing tasks. ERASER further enhances existing AgentBench performance, underscoring the practical value of environment-aware error discrimination. Overall, the framework offers a scalable, efficient path for reliable LLM-based KBQA with strong transferability to other semantic-parsing contexts.

Abstract

Employing Large Language Models (LLMs) for semantic parsing has achieved remarkable success. However, we find existing methods fall short in terms of reliability and efficiency when hallucinations are encountered. In this paper, we address these challenges with a framework called QueryAgent, which solves a question step-by-step and performs step-wise self-correction. We introduce an environmental feedback-based self-correction method called ERASER. Unlike traditional approaches, ERASER leverages rich environmental feedback in the intermediate steps to perform selective and differentiated self-correction only when necessary. Experimental results demonstrate that QueryAgent notably outperforms all previous few-shot methods using only one example on GrailQA and GraphQ by 7.0 and 15.0 F1. Moreover, our approach exhibits superiority in terms of efficiency, including runtime, query overhead, and API invocation costs. By leveraging ERASER, we further improve another baseline (i.e., AgentBench) by approximately 10 points, revealing the strong transferability of our approach.

QueryAgent: A Reliable and Efficient Reasoning Framework with Environmental Feedback-based Self-Correction

TL;DR

This work tackles reliability and efficiency gaps in LLM-driven KBQA by introducing QueryAgent, a stepwise, tool-assisted reasoning framework, and ERASER, an environmental feedback-based self-correction mechanism. By leveraging PyQL-based query construction and targeted corrections guided by KB, Python, and reasoning-memory feedback, the approach achieves state-of-the-art results with a single example and substantially reduces runtime and API costs compared to prior methods. Across GrailQA, GraphQ, WebQSP, and MetaQA-3Hop, QueryAgent outperforms 100-shot baselines, and its transfer to WikiSQL demonstrates versatility to new semantic parsing tasks. ERASER further enhances existing AgentBench performance, underscoring the practical value of environment-aware error discrimination. Overall, the framework offers a scalable, efficient path for reliable LLM-based KBQA with strong transferability to other semantic-parsing contexts.

Abstract

Employing Large Language Models (LLMs) for semantic parsing has achieved remarkable success. However, we find existing methods fall short in terms of reliability and efficiency when hallucinations are encountered. In this paper, we address these challenges with a framework called QueryAgent, which solves a question step-by-step and performs step-wise self-correction. We introduce an environmental feedback-based self-correction method called ERASER. Unlike traditional approaches, ERASER leverages rich environmental feedback in the intermediate steps to perform selective and differentiated self-correction only when necessary. Experimental results demonstrate that QueryAgent notably outperforms all previous few-shot methods using only one example on GrailQA and GraphQ by 7.0 and 15.0 F1. Moreover, our approach exhibits superiority in terms of efficiency, including runtime, query overhead, and API invocation costs. By leveraging ERASER, we further improve another baseline (i.e., AgentBench) by approximately 10 points, revealing the strong transferability of our approach.
Paper Structure (36 sections, 7 figures, 13 tables, 1 algorithm)

This paper contains 36 sections, 7 figures, 13 tables, 1 algorithm.

Figures (7)

  • Figure 1: QueryAgent compared with two mainstream KBQA paradigms employing LLMs.
  • Figure 2: An example of QueryAgent and ERASER. At each step, the LLM generates thought and action based on the previous steps. Based on the action's execution status (KB and Python) and reasoning memory, ERASER detects whether an error exists. If no error is detected, the observation of this step is the execution result on KB(i.e., guideline *), and LLM is conducting normal reasoning. Otherwise, the observation is the corresponding self-correction guideline(e.g., guideline A/B/C), and LLM is conducting self-correction.
  • Figure 3: Prompt of GrailQA (Task description and tools document).
  • Figure 4: Prompt of GrailQA (1-shot example and new question).
  • Figure 5: A reasoning and self-correction example of GrailQA.
  • ...and 2 more figures