Table of Contents
Fetching ...

Cooperative SQL Generation for Segmented Databases By Using Multi-functional LLM Agents

Zhiguang Wu, Fengbin Zhu, Xuequn Shang, Yupei Zhang, Pan Zhou

TL;DR

CSMA introduces a cooperative framework where multiple LLM-based agents each hold a partition of the database schema and iteratively collaborate to generate correct SQL queries for natural language questions. A global schema bridges inter-agent communication, and the process comprises three stages: schema collection, SQL generation, and correctness checking, executed over multiple rounds until success. Empirical results on Spider and Bird show that two-part partial-schema collaboration approaches state-of-the-art performance while preserving data privacy, with ablations confirming the importance of retention, exchange, and checking components and in-context learning enhancing few-shot performance. The work advances practical Text-to-SQL for large, partitioned databases and suggests pathways toward privacy-preserving, federated database querying in industrial settings.

Abstract

Text-to-SQL task aims to automatically yield SQL queries according to user text questions. To address this problem, we propose a Cooperative SQL Generation framework based on Multi-functional Agents (CSMA) through information interaction among large language model (LLM) based agents who own part of the database schema seperately. Inspired by the collaboration in human teamwork, CSMA consists of three stages: 1) Question-related schema collection, 2) Question-corresponding SQL query generation, and 3) SQL query correctness check. In the first stage, agents analyze their respective schema and communicate with each other to collect the schema information relevant to the question. In the second stage, agents try to generate the corresponding SQL query for the question using the collected information. In the third stage, agents check if the SQL query is created correctly according to their known information. This interaction-based method makes the question-relevant part of database schema from each agent to be used for SQL generation and check. Experiments on the Spider and Bird benckmark demonstrate that CSMA achieves a high performance level comparable to the state-of-the-arts, meanwhile holding the private data in these individual agents.

Cooperative SQL Generation for Segmented Databases By Using Multi-functional LLM Agents

TL;DR

CSMA introduces a cooperative framework where multiple LLM-based agents each hold a partition of the database schema and iteratively collaborate to generate correct SQL queries for natural language questions. A global schema bridges inter-agent communication, and the process comprises three stages: schema collection, SQL generation, and correctness checking, executed over multiple rounds until success. Empirical results on Spider and Bird show that two-part partial-schema collaboration approaches state-of-the-art performance while preserving data privacy, with ablations confirming the importance of retention, exchange, and checking components and in-context learning enhancing few-shot performance. The work advances practical Text-to-SQL for large, partitioned databases and suggests pathways toward privacy-preserving, federated database querying in industrial settings.

Abstract

Text-to-SQL task aims to automatically yield SQL queries according to user text questions. To address this problem, we propose a Cooperative SQL Generation framework based on Multi-functional Agents (CSMA) through information interaction among large language model (LLM) based agents who own part of the database schema seperately. Inspired by the collaboration in human teamwork, CSMA consists of three stages: 1) Question-related schema collection, 2) Question-corresponding SQL query generation, and 3) SQL query correctness check. In the first stage, agents analyze their respective schema and communicate with each other to collect the schema information relevant to the question. In the second stage, agents try to generate the corresponding SQL query for the question using the collected information. In the third stage, agents check if the SQL query is created correctly according to their known information. This interaction-based method makes the question-relevant part of database schema from each agent to be used for SQL generation and check. Experiments on the Spider and Bird benckmark demonstrate that CSMA achieves a high performance level comparable to the state-of-the-arts, meanwhile holding the private data in these individual agents.

Paper Structure

This paper contains 19 sections, 5 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: The distribution of database tables for agents. Each agent masters a different part of the database which might overlap with each other.
  • Figure 2: The overview of our CSMA framework, where in each round an working agent is selected to interact with the global state.
  • Figure 3: The procedure of question-related schema collection. The private schema and global schema are merged into the known schema, which is extracted and merged into the global schema.
  • Figure 4: The procedure of question-corresponding SQL generation and SQL correctness check. The working agent uses the SQL Generation function to generate the SQL query, which is checked by the next agent according to its known schema.
  • Figure 5: An example of the whole Text-to-SQL process. Several agents take turns to update the global states until the checking result is positive.