Table of Contents
Fetching ...

How to Prompt LLMs for Text-to-SQL: A Study in Zero-shot, Single-domain, and Cross-domain Settings

Shuaichen Chang, Eric Fosler-Lussier

TL;DR

This study systematically examines how prompt construction affects large language model-driven text-to-SQL across zero-shot, single-domain, and cross-domain settings. By comparing schema and content representations for databases and various demonstration prompt strategies, the authors identify key factors that drive performance and propose practical guidelines. Key findings include the importance of table relationships and content in prompts, the existence of a prompt-length sweet spot in cross-domain settings, and the enduring value of in-domain demonstrations for single-domain tasks. The work provides actionable insights to standardize prompt design, improve comparability, and guide future research in text-to-SQL with LLMs.

Abstract

Large language models (LLMs) with in-context learning have demonstrated remarkable capability in the text-to-SQL task. Previous research has prompted LLMs with various demonstration-retrieval strategies and intermediate reasoning steps to enhance the performance of LLMs. However, those works often employ varied strategies when constructing the prompt text for text-to-SQL inputs, such as databases and demonstration examples. This leads to a lack of comparability in both the prompt constructions and their primary contributions. Furthermore, selecting an effective prompt construction has emerged as a persistent problem for future research. To address this limitation, we comprehensively investigate the impact of prompt constructions across various settings and provide insights into prompt constructions for future text-to-SQL studies.

How to Prompt LLMs for Text-to-SQL: A Study in Zero-shot, Single-domain, and Cross-domain Settings

TL;DR

This study systematically examines how prompt construction affects large language model-driven text-to-SQL across zero-shot, single-domain, and cross-domain settings. By comparing schema and content representations for databases and various demonstration prompt strategies, the authors identify key factors that drive performance and propose practical guidelines. Key findings include the importance of table relationships and content in prompts, the existence of a prompt-length sweet spot in cross-domain settings, and the enduring value of in-domain demonstrations for single-domain tasks. The work provides actionable insights to standardize prompt design, improve comparability, and guide future research in text-to-SQL with LLMs.

Abstract

Large language models (LLMs) with in-context learning have demonstrated remarkable capability in the text-to-SQL task. Previous research has prompted LLMs with various demonstration-retrieval strategies and intermediate reasoning steps to enhance the performance of LLMs. However, those works often employ varied strategies when constructing the prompt text for text-to-SQL inputs, such as databases and demonstration examples. This leads to a lack of comparability in both the prompt constructions and their primary contributions. Furthermore, selecting an effective prompt construction has emerged as a persistent problem for future research. To address this limitation, we comprehensively investigate the impact of prompt constructions across various settings and provide insights into prompt constructions for future text-to-SQL studies.
Paper Structure (29 sections, 9 figures, 5 tables)

This paper contains 29 sections, 9 figures, 5 tables.

Figures (9)

  • Figure 1: An example of prompt text for 1-shot single-domain text-to-SQL using a snippet of the database Network_1 with a question from the Spider dataset yu2018spider.
  • Figure 2: Examples of the different database schema constructions for a snippet of database Network_1 in Spider.
  • Figure 3: Examples of the different database content constructions for showing 3 cell values in each column for the Highschool table in Figure \ref{['fig:prompt_schema']}.
  • Figure 4: An example of the normalization for database and SQL prompts.
  • Figure 5: Execution accuracy of Codex and ChatGPT for single-domain text-to-SQL with 1, 4, 8, and 16 in-domain examples. RS and Cont correspond to table relationship and table content, respectively. Detailed results can be found in Table \ref{['tab:single_domain_codex_results']} and \ref{['tab:single_domain_chatgpt_results']}.
  • ...and 4 more figures