GenEdit: Compounding Operators and Continuous Improvement to Tackle Text-to-SQL in the Enterprise
Karime Maamari, Connor Landy, Amine Mhedhbi
TL;DR
GenEdit targets enterprise Text-to-SQL by coupling a company-specific knowledge base with a two-module architecture: a decomposed SQL generation pipeline that uses compounding operators and a planning-based chain-of-thought approach, and an edits-recommendation module that iteratively refines the knowledge set through user feedback. The system constructs a contextual knowledge view during pre-processing, then reformulates and intents-classifies inputs to retrieve relevant examples, instructions, and schema before generating SQL via a two-step process that minimizes LLM reasoning. Evaluation on the BIRD benchmark shows competitive performance and highlights the value of instructional guidance and context-aware retrieval, while ablations quantify the contributions of each component. The work demonstrates a practical pathway to continuous improvement in enterprise Text-to-SQL through automated feedback loops, SME collaboration, and rigorous regression testing, enabling handling of high-complexity queries in real deployments.
Abstract
Recent advancements in Text-to-SQL, driven by large language models, are democratizing data access. Despite these advancements, enterprise deployments remain challenging due to the need to capture business-specific knowledge, handle complex queries, and meet expectations of continuous improvements. To address these issues, we designed and implemented GenEdit: our Text-to-SQL generation system that improves with user feedback. GenEdit builds and maintains a company-specific knowledge set, employs a pipeline of operators decomposing SQL generation, and uses feedback to update its knowledge set to improve future SQL generations. We describe GenEdit's architecture made of two core modules: (i) decomposed SQL generation; and (ii) knowledge set edits based on user feedback. For generation, GenEdit leverages compounding operators to improve knowledge retrieval and to create a plan as chain-of-thought steps that guides generation. GenEdit first retrieves relevant examples in an initial retrieval stage where original SQL queries are decomposed into sub-statements, clauses or sub-queries. It then also retrieves instructions and schema elements. Using the retrieved contextual information, GenEdit then generates step-by-step plan in natural language on how to produce the query. Finally, GenEdit uses the plan to generate SQL, minimizing the need for model reasoning, which enhances complex SQL generation. If necessary, GenEdit regenerates the query based on syntactic and semantic errors. The knowledge set edits are recommended through an interactive copilot, allowing users to iterate on their feedback and to regenerate SQL queries as needed. Each generation uses staged edits which update the generation prompt. Once the feedback is submitted, it gets merged after passing regression testing and obtaining an approval, improving future generations.
