Table of Contents
Fetching ...

GenEdit: Compounding Operators and Continuous Improvement to Tackle Text-to-SQL in the Enterprise

Karime Maamari, Connor Landy, Amine Mhedhbi

TL;DR

GenEdit targets enterprise Text-to-SQL by coupling a company-specific knowledge base with a two-module architecture: a decomposed SQL generation pipeline that uses compounding operators and a planning-based chain-of-thought approach, and an edits-recommendation module that iteratively refines the knowledge set through user feedback. The system constructs a contextual knowledge view during pre-processing, then reformulates and intents-classifies inputs to retrieve relevant examples, instructions, and schema before generating SQL via a two-step process that minimizes LLM reasoning. Evaluation on the BIRD benchmark shows competitive performance and highlights the value of instructional guidance and context-aware retrieval, while ablations quantify the contributions of each component. The work demonstrates a practical pathway to continuous improvement in enterprise Text-to-SQL through automated feedback loops, SME collaboration, and rigorous regression testing, enabling handling of high-complexity queries in real deployments.

Abstract

Recent advancements in Text-to-SQL, driven by large language models, are democratizing data access. Despite these advancements, enterprise deployments remain challenging due to the need to capture business-specific knowledge, handle complex queries, and meet expectations of continuous improvements. To address these issues, we designed and implemented GenEdit: our Text-to-SQL generation system that improves with user feedback. GenEdit builds and maintains a company-specific knowledge set, employs a pipeline of operators decomposing SQL generation, and uses feedback to update its knowledge set to improve future SQL generations. We describe GenEdit's architecture made of two core modules: (i) decomposed SQL generation; and (ii) knowledge set edits based on user feedback. For generation, GenEdit leverages compounding operators to improve knowledge retrieval and to create a plan as chain-of-thought steps that guides generation. GenEdit first retrieves relevant examples in an initial retrieval stage where original SQL queries are decomposed into sub-statements, clauses or sub-queries. It then also retrieves instructions and schema elements. Using the retrieved contextual information, GenEdit then generates step-by-step plan in natural language on how to produce the query. Finally, GenEdit uses the plan to generate SQL, minimizing the need for model reasoning, which enhances complex SQL generation. If necessary, GenEdit regenerates the query based on syntactic and semantic errors. The knowledge set edits are recommended through an interactive copilot, allowing users to iterate on their feedback and to regenerate SQL queries as needed. Each generation uses staged edits which update the generation prompt. Once the feedback is submitted, it gets merged after passing regression testing and obtaining an approval, improving future generations.

GenEdit: Compounding Operators and Continuous Improvement to Tackle Text-to-SQL in the Enterprise

TL;DR

GenEdit targets enterprise Text-to-SQL by coupling a company-specific knowledge base with a two-module architecture: a decomposed SQL generation pipeline that uses compounding operators and a planning-based chain-of-thought approach, and an edits-recommendation module that iteratively refines the knowledge set through user feedback. The system constructs a contextual knowledge view during pre-processing, then reformulates and intents-classifies inputs to retrieve relevant examples, instructions, and schema before generating SQL via a two-step process that minimizes LLM reasoning. Evaluation on the BIRD benchmark shows competitive performance and highlights the value of instructional guidance and context-aware retrieval, while ablations quantify the contributions of each component. The work demonstrates a practical pathway to continuous improvement in enterprise Text-to-SQL through automated feedback loops, SME collaboration, and rigorous regression testing, enabling handling of high-complexity queries in real deployments.

Abstract

Recent advancements in Text-to-SQL, driven by large language models, are democratizing data access. Despite these advancements, enterprise deployments remain challenging due to the need to capture business-specific knowledge, handle complex queries, and meet expectations of continuous improvements. To address these issues, we designed and implemented GenEdit: our Text-to-SQL generation system that improves with user feedback. GenEdit builds and maintains a company-specific knowledge set, employs a pipeline of operators decomposing SQL generation, and uses feedback to update its knowledge set to improve future SQL generations. We describe GenEdit's architecture made of two core modules: (i) decomposed SQL generation; and (ii) knowledge set edits based on user feedback. For generation, GenEdit leverages compounding operators to improve knowledge retrieval and to create a plan as chain-of-thought steps that guides generation. GenEdit first retrieves relevant examples in an initial retrieval stage where original SQL queries are decomposed into sub-statements, clauses or sub-queries. It then also retrieves instructions and schema elements. Using the retrieved contextual information, GenEdit then generates step-by-step plan in natural language on how to produce the query. Finally, GenEdit uses the plan to generate SQL, minimizing the need for model reasoning, which enhances complex SQL generation. If necessary, GenEdit regenerates the query based on syntactic and semantic errors. The knowledge set edits are recommended through an interactive copilot, allowing users to iterate on their feedback and to regenerate SQL queries as needed. Each generation uses staged edits which update the generation prompt. Once the feedback is submitted, it gets merged after passing regression testing and obtaining an approval, improving future generations.

Paper Structure

This paper contains 26 sections, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Overview of GenEdit architecture showcasing its pipeline operators for SQL generation and edits recommendation modules.
  • Figure 2: Example of retrieved knowledge and plan generated for $Q_{fin-perf}$, which is then used for SQL prediction.
  • Figure 3: Example of GenEdit UI interfaces for the .
  • Figure 4: Example of a feedback displayed within the knowledge set library of GenEdit.