AMBROSIA: A Benchmark for Parsing Ambiguous Questions into Database Queries

Irina Saparina; Mirella Lapata

AMBROSIA: A Benchmark for Parsing Ambiguous Questions into Database Queries

Irina Saparina, Mirella Lapata

TL;DR

A new benchmark, AMBROSIA, is introduced, which is hope will inform and inspire the development of text-to-SQL parsers capable of recognizing and interpreting ambiguous requests, revealing that even the most advanced models struggle to identify and interpret ambiguity in questions.

Abstract

Practical semantic parsers are expected to understand user utterances and map them to executable programs, even when these are ambiguous. We introduce a new benchmark, AMBROSIA, which we hope will inform and inspire the development of text-to-SQL parsers capable of recognizing and interpreting ambiguous requests. Our dataset contains questions showcasing three different types of ambiguity (scope ambiguity, attachment ambiguity, and vagueness), their interpretations, and corresponding SQL queries. In each case, the ambiguity persists even when the database context is provided. This is achieved through a novel approach that involves controlled generation of databases from scratch. We benchmark various LLMs on AMBROSIA, revealing that even the most advanced models struggle to identify and interpret ambiguity in questions.

AMBROSIA: A Benchmark for Parsing Ambiguous Questions into Database Queries

TL;DR

Abstract

Paper Structure (88 sections, 7 figures, 7 tables)

This paper contains 88 sections, 7 figures, 7 tables.

Introduction
Related Work
Formal Definition of Ambiguity
Design Considerations
Executable Logical Forms
Databases that Support Ambiguity
Different Ambiguity Types
Database Generation
Domains, Concepts, and Relations
Database Generation via SQL statements
Question and SQL Annotation
Scope and Attachment Ambiguity
Vagueness
Dataset Analysis
Experiments
...and 73 more sections

Figures (7)

Figure 1: Types of ambiguous questions (highlighted in blue), their interpretations (highlighted in green), and corresponding SQL queries. Database elements that could lead to ambiguity are highlighted in orange.
Figure 2: Annotation process for scope ambiguity in the "Health" domain.
Figure 3: The prompt, templates, in-context examples (only one out of ten is shown for brevity, see Appendix \ref{['apdx:prompts_concepts']} for the full versions), and predictions of key concepts and relations for each ambiguity type. Generated key concepts and relations later become sources of ambiguity in questions and databases (shown at the bottom for illustrative purposes).
Figure 4: Recall, precision, and AllFound metrics for zero-shot and few-shot Llama3-70B. In-context examples are selected randomly. We obtain best results with 1-3 examples.
Figure 5: Database configurations that support attachment ambiguity.
...and 2 more figures

Theorems & Definitions (2)

Definition 1
Definition 2

AMBROSIA: A Benchmark for Parsing Ambiguous Questions into Database Queries

TL;DR

Abstract

AMBROSIA: A Benchmark for Parsing Ambiguous Questions into Database Queries

Authors

TL;DR

Abstract

Table of Contents

Figures (7)

Theorems & Definitions (2)