Table of Contents
Fetching ...

CoSQL: A Conversational Text-to-SQL Challenge Towards Cross-Domain Natural Language Interfaces to Databases

Tao Yu, Rui Zhang, He Yang Er, Suyi Li, Eric Xue, Bo Pang, Xi Victoria Lin, Yi Chern Tan, Tianze Shi, Zihan Li, Youxuan Jiang, Michihiro Yasunaga, Sungrok Shim, Tao Chen, Alexander Fabbri, Zifan Li, Luyao Chen, Yuwen Zhang, Shreya Dixit, Vincent Zhang, Caiming Xiong, Richard Socher, Walter S Lasecki, Dragomir Radev

TL;DR

CoSQL presents the first large-scale, cross-domain conversational text-to-SQL corpus collected under a Wizard-of-Oz setup, featuring 3,007 dialogues, 30k+ turns, and 10k annotated SQL queries over 200 databases across 138 domains. It defines three interconnected tasks—SQL-grounded dialogue state tracking, response generation from SQL results, and user dialogue act prediction—and evaluates strong baselines, illustrating substantial cross-domain generalization challenges. The dataset emphasizes cross-domain interpretability and faithful NL explanations of SQL execution, while highlighting the need for robust handling of ambiguous and unanswerable questions. The work provides a public benchmark and leaderboards to spur advances toward practical, general-purpose NL interfaces to databases.

Abstract

We present CoSQL, a corpus for building cross-domain, general-purpose database (DB) querying dialogue systems. It consists of 30k+ turns plus 10k+ annotated SQL queries, obtained from a Wizard-of-Oz (WOZ) collection of 3k dialogues querying 200 complex DBs spanning 138 domains. Each dialogue simulates a real-world DB query scenario with a crowd worker as a user exploring the DB and a SQL expert retrieving answers with SQL, clarifying ambiguous questions, or otherwise informing of unanswerable questions. When user questions are answerable by SQL, the expert describes the SQL and execution results to the user, hence maintaining a natural interaction flow. CoSQL introduces new challenges compared to existing task-oriented dialogue datasets:(1) the dialogue states are grounded in SQL, a domain-independent executable representation, instead of domain-specific slot-value pairs, and (2) because testing is done on unseen databases, success requires generalizing to new domains. CoSQL includes three tasks: SQL-grounded dialogue state tracking, response generation from query results, and user dialogue act prediction. We evaluate a set of strong baselines for each task and show that CoSQL presents significant challenges for future research. The dataset, baselines, and leaderboard will be released at https://yale-lily.github.io/cosql.

CoSQL: A Conversational Text-to-SQL Challenge Towards Cross-Domain Natural Language Interfaces to Databases

TL;DR

CoSQL presents the first large-scale, cross-domain conversational text-to-SQL corpus collected under a Wizard-of-Oz setup, featuring 3,007 dialogues, 30k+ turns, and 10k annotated SQL queries over 200 databases across 138 domains. It defines three interconnected tasks—SQL-grounded dialogue state tracking, response generation from SQL results, and user dialogue act prediction—and evaluates strong baselines, illustrating substantial cross-domain generalization challenges. The dataset emphasizes cross-domain interpretability and faithful NL explanations of SQL execution, while highlighting the need for robust handling of ambiguous and unanswerable questions. The work provides a public benchmark and leaderboards to spur advances toward practical, general-purpose NL interfaces to databases.

Abstract

We present CoSQL, a corpus for building cross-domain, general-purpose database (DB) querying dialogue systems. It consists of 30k+ turns plus 10k+ annotated SQL queries, obtained from a Wizard-of-Oz (WOZ) collection of 3k dialogues querying 200 complex DBs spanning 138 domains. Each dialogue simulates a real-world DB query scenario with a crowd worker as a user exploring the DB and a SQL expert retrieving answers with SQL, clarifying ambiguous questions, or otherwise informing of unanswerable questions. When user questions are answerable by SQL, the expert describes the SQL and execution results to the user, hence maintaining a natural interaction flow. CoSQL introduces new challenges compared to existing task-oriented dialogue datasets:(1) the dialogue states are grounded in SQL, a domain-independent executable representation, instead of domain-specific slot-value pairs, and (2) because testing is done on unseen databases, success requires generalizing to new domains. CoSQL includes three tasks: SQL-grounded dialogue state tracking, response generation from query results, and user dialogue act prediction. We evaluate a set of strong baselines for each task and show that CoSQL presents significant challenges for future research. The dataset, baselines, and leaderboard will be released at https://yale-lily.github.io/cosql.

Paper Structure

This paper contains 68 sections, 11 figures, 7 tables.

Figures (11)

  • Figure 1: A dialog from the CoSQL dataset. Gray boxes separate the user inputs ($Q_i$) querying the database ($D_i$) from the SQL queries ($S_i$), returned answers ($A_i$), and expert responses ($R_i$). Users send an input to the expert, who writes the corresponding SQL query (only seen by the expert) if possible and sends an answer and response description back. Dialogue acts are on the right-hand side (e.g., $Q_3$ is "ambiguous" and $R_3$ is "clarify").
  • Figure 2: Distributions of dialogue lengths.
  • Figure 3: Distributions of user dialog action types.
  • Figure 4: SQL keyword counts.
  • Figure 5: Percentage of question sequences that contain a particular SQL keyword at a specific user utterance turn. The keyword occurrences in CoSQL (upper) slightly fluctuates as the interaction proceeds while that in SParC (lower) demonstrates a clear increasing trend.
  • ...and 6 more figures