Table of Contents
Fetching ...

AMBROSIA: A Benchmark for Parsing Ambiguous Questions into Database Queries

Irina Saparina, Mirella Lapata

TL;DR

A new benchmark, AMBROSIA, is introduced, which is hope will inform and inspire the development of text-to-SQL parsers capable of recognizing and interpreting ambiguous requests, revealing that even the most advanced models struggle to identify and interpret ambiguity in questions.

Abstract

Practical semantic parsers are expected to understand user utterances and map them to executable programs, even when these are ambiguous. We introduce a new benchmark, AMBROSIA, which we hope will inform and inspire the development of text-to-SQL parsers capable of recognizing and interpreting ambiguous requests. Our dataset contains questions showcasing three different types of ambiguity (scope ambiguity, attachment ambiguity, and vagueness), their interpretations, and corresponding SQL queries. In each case, the ambiguity persists even when the database context is provided. This is achieved through a novel approach that involves controlled generation of databases from scratch. We benchmark various LLMs on AMBROSIA, revealing that even the most advanced models struggle to identify and interpret ambiguity in questions.

AMBROSIA: A Benchmark for Parsing Ambiguous Questions into Database Queries

TL;DR

A new benchmark, AMBROSIA, is introduced, which is hope will inform and inspire the development of text-to-SQL parsers capable of recognizing and interpreting ambiguous requests, revealing that even the most advanced models struggle to identify and interpret ambiguity in questions.

Abstract

Practical semantic parsers are expected to understand user utterances and map them to executable programs, even when these are ambiguous. We introduce a new benchmark, AMBROSIA, which we hope will inform and inspire the development of text-to-SQL parsers capable of recognizing and interpreting ambiguous requests. Our dataset contains questions showcasing three different types of ambiguity (scope ambiguity, attachment ambiguity, and vagueness), their interpretations, and corresponding SQL queries. In each case, the ambiguity persists even when the database context is provided. This is achieved through a novel approach that involves controlled generation of databases from scratch. We benchmark various LLMs on AMBROSIA, revealing that even the most advanced models struggle to identify and interpret ambiguity in questions.
Paper Structure (88 sections, 7 figures, 7 tables)

This paper contains 88 sections, 7 figures, 7 tables.

Figures (7)

  • Figure 1: Types of ambiguous questions (highlighted in blue), their interpretations (highlighted in green), and corresponding SQL queries. Database elements that could lead to ambiguity are highlighted in orange.
  • Figure 2: Annotation process for scope ambiguity in the "Health" domain.
  • Figure 3: The prompt, templates, in-context examples (only one out of ten is shown for brevity, see Appendix \ref{['apdx:prompts_concepts']} for the full versions), and predictions of key concepts and relations for each ambiguity type. Generated key concepts and relations later become sources of ambiguity in questions and databases (shown at the bottom for illustrative purposes).
  • Figure 4: Recall, precision, and AllFound metrics for zero-shot and few-shot Llama3-70B. In-context examples are selected randomly. We obtain best results with 1-3 examples.
  • Figure 5: Database configurations that support attachment ambiguity.
  • ...and 2 more figures

Theorems & Definitions (2)

  • Definition 1
  • Definition 2