QueryAgent: A Reliable and Efficient Reasoning Framework with Environmental Feedback-based Self-Correction
Xiang Huang, Sitao Cheng, Shanshan Huang, Jiayu Shen, Yong Xu, Chaoyun Zhang, Yuzhong Qu
TL;DR
This work tackles reliability and efficiency gaps in LLM-driven KBQA by introducing QueryAgent, a stepwise, tool-assisted reasoning framework, and ERASER, an environmental feedback-based self-correction mechanism. By leveraging PyQL-based query construction and targeted corrections guided by KB, Python, and reasoning-memory feedback, the approach achieves state-of-the-art results with a single example and substantially reduces runtime and API costs compared to prior methods. Across GrailQA, GraphQ, WebQSP, and MetaQA-3Hop, QueryAgent outperforms 100-shot baselines, and its transfer to WikiSQL demonstrates versatility to new semantic parsing tasks. ERASER further enhances existing AgentBench performance, underscoring the practical value of environment-aware error discrimination. Overall, the framework offers a scalable, efficient path for reliable LLM-based KBQA with strong transferability to other semantic-parsing contexts.
Abstract
Employing Large Language Models (LLMs) for semantic parsing has achieved remarkable success. However, we find existing methods fall short in terms of reliability and efficiency when hallucinations are encountered. In this paper, we address these challenges with a framework called QueryAgent, which solves a question step-by-step and performs step-wise self-correction. We introduce an environmental feedback-based self-correction method called ERASER. Unlike traditional approaches, ERASER leverages rich environmental feedback in the intermediate steps to perform selective and differentiated self-correction only when necessary. Experimental results demonstrate that QueryAgent notably outperforms all previous few-shot methods using only one example on GrailQA and GraphQ by 7.0 and 15.0 F1. Moreover, our approach exhibits superiority in terms of efficiency, including runtime, query overhead, and API invocation costs. By leveraging ERASER, we further improve another baseline (i.e., AgentBench) by approximately 10 points, revealing the strong transferability of our approach.
