Table of Contents
Fetching ...

Editing-Based SQL Query Generation for Cross-Domain Context-Dependent Questions

Rui Zhang, Tao Yu, He Yang Er, Sungrok Shim, Eric Xue, Xi Victoria Lin, Tianze Shi, Caiming Xiong, Richard Socher, Dragomir Radev

TL;DR

The paper tackles cross-domain, context-dependent text-to-SQL generation by proposing an editing-based approach that reuses the previous SQL output at the token level. It introduces an utterance-table encoder, a turn-aware interaction encoder, and a table-aware decoder, augmented with a query editing mechanism that can copy from or insert tokens from the prior query. Evaluated on SParC and Spider, the method yields substantial gains over state-of-the-art baselines, especially when combined with utterance-table BERT embeddings, and demonstrates robustness to error propagation. The proposed framework advances cross-domain semantic parsing by effectively integrating user utterances, table schemas, and history through editing-based generation.

Abstract

We focus on the cross-domain context-dependent text-to-SQL generation task. Based on the observation that adjacent natural language questions are often linguistically dependent and their corresponding SQL queries tend to overlap, we utilize the interaction history by editing the previous predicted query to improve the generation quality. Our editing mechanism views SQL as sequences and reuses generation results at the token level in a simple manner. It is flexible to change individual tokens and robust to error propagation. Furthermore, to deal with complex table structures in different domains, we employ an utterance-table encoder and a table-aware decoder to incorporate the context of the user utterance and the table schema. We evaluate our approach on the SParC dataset and demonstrate the benefit of editing compared with the state-of-the-art baselines which generate SQL from scratch. Our code is available at https://github.com/ryanzhumich/sparc_atis_pytorch.

Editing-Based SQL Query Generation for Cross-Domain Context-Dependent Questions

TL;DR

The paper tackles cross-domain, context-dependent text-to-SQL generation by proposing an editing-based approach that reuses the previous SQL output at the token level. It introduces an utterance-table encoder, a turn-aware interaction encoder, and a table-aware decoder, augmented with a query editing mechanism that can copy from or insert tokens from the prior query. Evaluated on SParC and Spider, the method yields substantial gains over state-of-the-art baselines, especially when combined with utterance-table BERT embeddings, and demonstrates robustness to error propagation. The proposed framework advances cross-domain semantic parsing by effectively integrating user utterances, table schemas, and history through editing-based generation.

Abstract

We focus on the cross-domain context-dependent text-to-SQL generation task. Based on the observation that adjacent natural language questions are often linguistically dependent and their corresponding SQL queries tend to overlap, we utilize the interaction history by editing the previous predicted query to improve the generation quality. Our editing mechanism views SQL as sequences and reuses generation results at the token level in a simple manner. It is flexible to change individual tokens and robust to error propagation. Furthermore, to deal with complex table structures in different domains, we employ an utterance-table encoder and a table-aware decoder to incorporate the context of the user utterance and the table schema. We evaluate our approach on the SParC dataset and demonstrate the benefit of editing compared with the state-of-the-art baselines which generate SQL from scratch. Our code is available at https://github.com/ryanzhumich/sparc_atis_pytorch.

Paper Structure

This paper contains 16 sections, 14 equations, 5 figures, 6 tables.

Figures (5)

  • Figure 1: Model architecture of editing the previous query with attentions to the user utterances, the table schema, and the previously generated query.
  • Figure 2: Utterance-Table Encoder for the example in (a).
  • Figure 3: Number of operations at different turns.
  • Figure 4: Performance split by different turns (Left) and hardness levels (Right) on SParC dev set.
  • Figure 5: Effect of query editing at different turns on SParC dev set.