A Multi-agent Text2SQL Framework using Small Language Models and Execution Feedback
Thanh Dat Hoang, Thanh Trung Huynh, Matthias Weidlich, Thanh Tam Nguyen, Tong Chen, Hongzhi Yin, Quoc Viet Hung Nguyen
TL;DR
This paper tackles the privacy and cost limitations of external LLMs for Text2SQL by introducing MATS, a multi-agent framework that leverages small language models to generate SQL via specialized agents (Schema Insight, Planner, Validator, Fix, Selection) and execution feedback. It introduces Reinforcement Learning from Execution Feedback (RLEF) with ORPO-based alignment to harmonize agent outputs using database responses rather than human labels, and augments training with manually labeled data and few-shot prompts on Spider and BIRD. Empirically, MATS matches or exceeds open-source baselines and approaches closed-source performance on Spider and BIRD benchmarks while operating on a single 24GB GPU with 9B parameters. The work demonstrates robust performance, thoughtful data and prompt design, and a practical path to cost-effective, privacy-preserving Text2SQL deployments.
Abstract
Text2SQL, the task of generating SQL queries from natural language text, is a critical challenge in data engineering. Recently, Large Language Models (LLMs) have demonstrated superior performance for this task due to their advanced comprehension and generation capabilities. However, privacy and cost considerations prevent companies from using Text2SQL solutions based on external LLMs offered as a service. Rather, small LLMs (SLMs) that are openly available and can hosted in-house are adopted. These SLMs, in turn, lack the generalization capabilities of larger LLMs, which impairs their effectiveness for complex tasks such as Text2SQL. To address these limitations, we propose MATS, a novel Text2SQL framework designed specifically for SLMs. MATS uses a multi-agent mechanism that assigns specialized roles to auxiliary agents, reducing individual workloads and fostering interaction. A training scheme based on reinforcement learning aligns these agents using feedback obtained during execution, thereby maintaining competitive performance despite a limited LLM size. Evaluation results using on benchmark datasets show that MATS, deployed on a single- GPU server, yields accuracy that are on-par with large-scale LLMs when using significantly fewer parameters. Our source code and data are available at https://github.com/thanhdath/mats-sql.
