Table of Contents
Fetching ...

Lucy: Think and Reason to Solve Text-to-SQL

Nina Narodytska, Shay Vargaftik

TL;DR

Lucy introduces a three-phase framework that decouples natural-language understanding from automated relational reasoning to solve text-to-SQL on large, complex enterprise databases. By encoding the database schema as a dbModel and using MatchTables to identify relevant objects, GenerateView to synthesize a constraint-respecting view, and QueryView to generate final SQL from that view, Lucy achieves strong zero-shot performance on challenging benchmarks. Experimental results across ACME Insurance, BIRD, and Cloud Resources demonstrate Lucy's superior coverage and competitive execution accuracy compared to zero-shot baselines, while revealing practical failure modes and debugging opportunities. The work advances practical, explainable text-to-SQL for enterprise schemas by combining LLM capabilities with formal reasoning, avoiding fine-tuning while enabling scalable handling of complex database designs.

Abstract

Large Language Models (LLMs) have made significant progress in assisting users to query databases in natural language. While LLM-based techniques provide state-of-the-art results on many standard benchmarks, their performance significantly drops when applied to large enterprise databases. The reason is that these databases have a large number of tables with complex relationships that are challenging for LLMs to reason about. We analyze challenges that LLMs face in these settings and propose a new solution that combines the power of LLMs in understanding questions with automated reasoning techniques to handle complex database constraints. Based on these ideas, we have developed a new framework that outperforms state-of-the-art techniques in zero-shot text-to-SQL on complex benchmarks

Lucy: Think and Reason to Solve Text-to-SQL

TL;DR

Lucy introduces a three-phase framework that decouples natural-language understanding from automated relational reasoning to solve text-to-SQL on large, complex enterprise databases. By encoding the database schema as a dbModel and using MatchTables to identify relevant objects, GenerateView to synthesize a constraint-respecting view, and QueryView to generate final SQL from that view, Lucy achieves strong zero-shot performance on challenging benchmarks. Experimental results across ACME Insurance, BIRD, and Cloud Resources demonstrate Lucy's superior coverage and competitive execution accuracy compared to zero-shot baselines, while revealing practical failure modes and debugging opportunities. The work advances practical, explainable text-to-SQL for enterprise schemas by combining LLM capabilities with formal reasoning, avoiding fine-tuning while enabling scalable handling of complex database designs.

Abstract

Large Language Models (LLMs) have made significant progress in assisting users to query databases in natural language. While LLM-based techniques provide state-of-the-art results on many standard benchmarks, their performance significantly drops when applied to large enterprise databases. The reason is that these databases have a large number of tables with complex relationships that are challenging for LLMs to reason about. We analyze challenges that LLMs face in these settings and propose a new solution that combines the power of LLMs in understanding questions with automated reasoning techniques to handle complex database constraints. Based on these ideas, we have developed a new framework that outperforms state-of-the-art techniques in zero-shot text-to-SQL on complex benchmarks
Paper Structure (102 sections, 3 equations, 3 figures, 7 tables, 2 algorithms)

This paper contains 102 sections, 3 equations, 3 figures, 7 tables, 2 algorithms.

Figures (3)

  • Figure 1: Objects and their relations in the database ddo.
  • Figure 2: Lucy's high-level workflow. Red colored boxes indicate phases performed by LLMs, and a green colored box is a phase performed by an automated reasoner.
  • Figure 3: A part of the abstract schema graph $G$ for ddo that includes core tables.

Theorems & Definitions (6)

  • Example 3.1
  • Example 3.2
  • Example D.1
  • Example D.2
  • Example D.3
  • Example D.4: Full version of Example \ref{['exm:onetwo:match']} for the question Q2