Cooperative SQL Generation for Segmented Databases By Using Multi-functional LLM Agents

Zhiguang Wu; Fengbin Zhu; Xuequn Shang; Yupei Zhang; Pan Zhou

Cooperative SQL Generation for Segmented Databases By Using Multi-functional LLM Agents

Zhiguang Wu, Fengbin Zhu, Xuequn Shang, Yupei Zhang, Pan Zhou

TL;DR

CSMA introduces a cooperative framework where multiple LLM-based agents each hold a partition of the database schema and iteratively collaborate to generate correct SQL queries for natural language questions. A global schema bridges inter-agent communication, and the process comprises three stages: schema collection, SQL generation, and correctness checking, executed over multiple rounds until success. Empirical results on Spider and Bird show that two-part partial-schema collaboration approaches state-of-the-art performance while preserving data privacy, with ablations confirming the importance of retention, exchange, and checking components and in-context learning enhancing few-shot performance. The work advances practical Text-to-SQL for large, partitioned databases and suggests pathways toward privacy-preserving, federated database querying in industrial settings.

Abstract

Text-to-SQL task aims to automatically yield SQL queries according to user text questions. To address this problem, we propose a Cooperative SQL Generation framework based on Multi-functional Agents (CSMA) through information interaction among large language model (LLM) based agents who own part of the database schema seperately. Inspired by the collaboration in human teamwork, CSMA consists of three stages: 1) Question-related schema collection, 2) Question-corresponding SQL query generation, and 3) SQL query correctness check. In the first stage, agents analyze their respective schema and communicate with each other to collect the schema information relevant to the question. In the second stage, agents try to generate the corresponding SQL query for the question using the collected information. In the third stage, agents check if the SQL query is created correctly according to their known information. This interaction-based method makes the question-relevant part of database schema from each agent to be used for SQL generation and check. Experiments on the Spider and Bird benckmark demonstrate that CSMA achieves a high performance level comparable to the state-of-the-arts, meanwhile holding the private data in these individual agents.

Cooperative SQL Generation for Segmented Databases By Using Multi-functional LLM Agents

TL;DR

Abstract

Cooperative SQL Generation for Segmented Databases By Using Multi-functional LLM Agents

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (5)