Table of Contents
Fetching ...

Towards Robustness of Text-to-SQL Models against Synonym Substitution

Yujian Gan, Xinyun Chen, Qiuping Huang, Matthew Purver, John R. Woodward, Jinxia Xie, Pengsheng Huang

TL;DR

Text-to-SQL models often rely on exact lexical matching between NL questions and schema terms, making them fragile to synonym substitutions. The authors introduce Spider-Syn, a human-curated dataset derived from Spider to evaluate robustness under paraphrase of schema words, and propose two defenses: Multi-Annotation Selection (MAS) and adversarial training with BERT-Attack. Experiments show substantial performance declines when training only on Spider, while MAS (especially ManualMAS) and adversarial training significantly restore robustness, with MAS offering strong performance without extra training and AutoMAS providing robustness against worst-case attacks. Overall, the work demonstrates practical, scalable strategies to reduce schema-linkage dependence and improve cross-domain robustness in text-to-SQL systems.

Abstract

Recently, there has been significant progress in studying neural networks to translate text descriptions into SQL queries. Despite achieving good performance on some public benchmarks, existing text-to-SQL models typically rely on the lexical matching between words in natural language (NL) questions and tokens in table schemas, which may render the models vulnerable to attacks that break the schema linking mechanism. In this work, we investigate the robustness of text-to-SQL models to synonym substitution. In particular, we introduce Spider-Syn, a human-curated dataset based on the Spider benchmark for text-to-SQL translation. NL questions in Spider-Syn are modified from Spider, by replacing their schema-related words with manually selected synonyms that reflect real-world question paraphrases. We observe that the accuracy dramatically drops by eliminating such explicit correspondence between NL questions and table schemas, even if the synonyms are not adversarially selected to conduct worst-case adversarial attacks. Finally, we present two categories of approaches to improve the model robustness. The first category of approaches utilizes additional synonym annotations for table schemas by modifying the model input, while the second category is based on adversarial training. We demonstrate that both categories of approaches significantly outperform their counterparts without the defense, and the first category of approaches are more effective.

Towards Robustness of Text-to-SQL Models against Synonym Substitution

TL;DR

Text-to-SQL models often rely on exact lexical matching between NL questions and schema terms, making them fragile to synonym substitutions. The authors introduce Spider-Syn, a human-curated dataset derived from Spider to evaluate robustness under paraphrase of schema words, and propose two defenses: Multi-Annotation Selection (MAS) and adversarial training with BERT-Attack. Experiments show substantial performance declines when training only on Spider, while MAS (especially ManualMAS) and adversarial training significantly restore robustness, with MAS offering strong performance without extra training and AutoMAS providing robustness against worst-case attacks. Overall, the work demonstrates practical, scalable strategies to reduce schema-linkage dependence and improve cross-domain robustness in text-to-SQL systems.

Abstract

Recently, there has been significant progress in studying neural networks to translate text descriptions into SQL queries. Despite achieving good performance on some public benchmarks, existing text-to-SQL models typically rely on the lexical matching between words in natural language (NL) questions and tokens in table schemas, which may render the models vulnerable to attacks that break the schema linking mechanism. In this work, we investigate the robustness of text-to-SQL models to synonym substitution. In particular, we introduce Spider-Syn, a human-curated dataset based on the Spider benchmark for text-to-SQL translation. NL questions in Spider-Syn are modified from Spider, by replacing their schema-related words with manually selected synonyms that reflect real-world question paraphrases. We observe that the accuracy dramatically drops by eliminating such explicit correspondence between NL questions and table schemas, even if the synonyms are not adversarially selected to conduct worst-case adversarial attacks. Finally, we present two categories of approaches to improve the model robustness. The first category of approaches utilizes additional synonym annotations for table schemas by modifying the model input, while the second category is based on adversarial training. We demonstrate that both categories of approaches significantly outperform their counterparts without the defense, and the first category of approaches are more effective.

Paper Structure

This paper contains 22 sections, 5 figures, 7 tables.

Figures (5)

  • Figure 1: Sample Spider questions that include the same tokens as the table schema annotations, and such questions constitute the majority of the Spider benchmark. In our Spider-Syn benchmark, we replace some schema words in the NL question with their synonyms, without changing the SQL query to synthesize.
  • Figure 2: Synonym substitution occurs in cell value words in both Spider and Spider-Syn.
  • Figure 3: Samples of replacing the original words or phrases by synonymous phrases.
  • Figure 4: Examples of synonym substitutions in the 'world' domain from Spider-Syn.
  • Figure 5: Input the BERT-Attack with and without domain information.