Grammar-based Neural Text-to-SQL Generation
Kevin Lin, Ben Bogin, Mark Neumann, Jonathan Berant, Matt Gardner
TL;DR
Grammar-based decoding with a dynamic schema-dependent SQL grammar reduces over-generation and improves text-to-SQL performance on ATIS and Spider. The model employs a two-tier rule system (global and linked) and runtime constraints, augmented by an identifier linking mechanism to map natural language to database entities. Experiments show notable gains over token-based baselines, including 4.5 percentage points in denotation accuracy on ATIS and 14.1 percentage points in exact component matching on Spider, with error analyses highlighting datetime parsing and linking as key challenges. Overall, the work demonstrates the value of context-sensitive, schema-aware generation and zero-shot generalization in semantic parsing for SQL.
Abstract
The sequence-to-sequence paradigm employed by neural text-to-SQL models typically performs token-level decoding and does not consider generating SQL hierarchically from a grammar. Grammar-based decoding has shown significant improvements for other semantic parsing tasks, but SQL and other general programming languages have complexities not present in logical formalisms that make writing hierarchical grammars difficult. We introduce techniques to handle these complexities, showing how to construct a schema-dependent grammar with minimal over-generation. We analyze these techniques on ATIS and Spider, two challenging text-to-SQL datasets, demonstrating that they yield 14--18\% relative reductions in error.
