Global Reasoning over Database Structures for Text-to-SQL Parsing
Ben Bogin, Matt Gardner, Jonathan Berant
TL;DR
The paper tackles zero-shot text-to-SQL parsing where unseen DB schemas hinder constant selection. It introduces a globally-aware parser that uses a gating graph neural network to softly select relevant DB constants by reasoning over the schema-question graph, and a re-ranking module that scores candidate queries based on the global alignment of constants to the question. On the Spider dataset, these components raise accuracy from 39.4% to 47.4%, surpassing prior state-of-the-art. The work demonstrates that integrating global structural reasoning with constant selection and query-level evaluation yields substantial improvements and offers a broadly applicable framework for zero-shot semantic parsing.
Abstract
State-of-the-art semantic parsers rely on auto-regressive decoding, emitting one symbol at a time. When tested against complex databases that are unobserved at training time (zero-shot), the parser often struggles to select the correct set of database constants in the new database, due to the local nature of decoding. In this work, we propose a semantic parser that globally reasons about the structure of the output query to make a more contextually-informed selection of database constants. We use message-passing through a graph neural network to softly select a subset of database constants for the output query, conditioned on the question. Moreover, we train a model to rank queries based on the global alignment of database constants to question words. We apply our techniques to the current state-of-the-art model for Spider, a zero-shot semantic parsing dataset with complex databases, increasing accuracy from 39.4% to 47.4%.
