Editing-Based SQL Query Generation for Cross-Domain Context-Dependent Questions
Rui Zhang, Tao Yu, He Yang Er, Sungrok Shim, Eric Xue, Xi Victoria Lin, Tianze Shi, Caiming Xiong, Richard Socher, Dragomir Radev
TL;DR
The paper tackles cross-domain, context-dependent text-to-SQL generation by proposing an editing-based approach that reuses the previous SQL output at the token level. It introduces an utterance-table encoder, a turn-aware interaction encoder, and a table-aware decoder, augmented with a query editing mechanism that can copy from or insert tokens from the prior query. Evaluated on SParC and Spider, the method yields substantial gains over state-of-the-art baselines, especially when combined with utterance-table BERT embeddings, and demonstrates robustness to error propagation. The proposed framework advances cross-domain semantic parsing by effectively integrating user utterances, table schemas, and history through editing-based generation.
Abstract
We focus on the cross-domain context-dependent text-to-SQL generation task. Based on the observation that adjacent natural language questions are often linguistically dependent and their corresponding SQL queries tend to overlap, we utilize the interaction history by editing the previous predicted query to improve the generation quality. Our editing mechanism views SQL as sequences and reuses generation results at the token level in a simple manner. It is flexible to change individual tokens and robust to error propagation. Furthermore, to deal with complex table structures in different domains, we employ an utterance-table encoder and a table-aware decoder to incorporate the context of the user utterance and the table schema. We evaluate our approach on the SParC dataset and demonstrate the benefit of editing compared with the state-of-the-art baselines which generate SQL from scratch. Our code is available at https://github.com/ryanzhumich/sparc_atis_pytorch.
