Zero and Few-shot Semantic Parsing with Ambiguous Inputs

Elias Stengel-Eskin; Kyle Rawlins; Benjamin Van Durme

Zero and Few-shot Semantic Parsing with Ambiguous Inputs

Elias Stengel-Eskin, Kyle Rawlins, Benjamin Van Durme

TL;DR

This work tackles ambiguity in semantic parsing by introducing AmP, an extensible framework and dataset that pair ambiguous natural-language utterances with two formal meanings ($LF_0$ and $LF_1$) across five ambiguity types, rendered in $FOL$ and Lisp. It evaluates large in-context learning models under zero-shot and few-shot settings using three new metrics that quantify how well models capture distributions over multiple meanings, including constrained decoding to ensure valid outputs. Key findings show that without explicit ambiguity signals, large pre-trained models often fail to represent multiple plausible meanings, but can align with ambiguity distributions when ambiguity is present in prompts or training data. The paper argues for explicit inclusion of ambiguity in datasets and evaluation protocols, releases AmP with accompanying code, and highlights the potential for interactive disambiguation to improve robustness in semantic parsing systems.

Abstract

Despite the frequent challenges posed by ambiguity when representing meaning via natural language, it is often ignored or deliberately removed in tasks mapping language to formally-designed representations, which generally assume a one-to-one mapping between linguistic and formal representations. We attempt to address this shortcoming by introducing AmP, a framework, dataset, and challenge for translating ambiguous natural language to formal representations like logic and code. We define templates and generate data for five well-documented linguistic ambiguities. Using AmP, we investigate how several few-shot text-to-code systems handle ambiguity, introducing three new metrics. We find that large pre-trained models perform poorly at capturing the distribution of possible meanings without deliberate instruction. However, models are able to capture the distribution well when ambiguity is attested in their inputs. These results motivate a call for including ambiguity explicitly in datasets and promote considering the distribution of possible outputs when evaluating systems. Data and code: https://github.com/esteng/ambiguous_parsing

Zero and Few-shot Semantic Parsing with Ambiguous Inputs

TL;DR

This work tackles ambiguity in semantic parsing by introducing AmP, an extensible framework and dataset that pair ambiguous natural-language utterances with two formal meanings (

and

) across five ambiguity types, rendered in

and Lisp. It evaluates large in-context learning models under zero-shot and few-shot settings using three new metrics that quantify how well models capture distributions over multiple meanings, including constrained decoding to ensure valid outputs. Key findings show that without explicit ambiguity signals, large pre-trained models often fail to represent multiple plausible meanings, but can align with ambiguity distributions when ambiguity is present in prompts or training data. The paper argues for explicit inclusion of ambiguity in datasets and evaluation protocols, releases AmP with accompanying code, and highlights the potential for interactive disambiguation to improve robustness in semantic parsing systems.

Abstract

Paper Structure (34 sections, 2 equations, 7 figures, 3 tables)

This paper contains 34 sections, 2 equations, 7 figures, 3 tables.

Introduction
Methods
Data
Models
Computing probability under a forced decode
Metrics
Zero-shot metrics
Few-shot metrics
Experiment 1: Zero-shot parsing
Zero-shot results and analysis
Human validation
Experiment 2: Few-shot parsing
Few-shot results and analysis
Related work
Discussion and Conclusion
...and 19 more sections

Figures (7)

Figure 1: An example of prepositional phrase (PP) attachment ambiguity. The statement is compatible with two possible interpretations, represented visually, in first-order logic, and in Lisp format.
Figure 2: Zero-shot exact-match accuracy on ambiguity types. Cg = Codegen, Ll = Llama, Vi = Vicuna. Models increasing in size from left to right.
Figure 3: $P(LF_0)$ per model.
Figure 4: Zero-shot per-token probability (darker is more probable) for each ambiguity type. Alternative parse given below each predicted parse. Token probability sometimes reflects divergences between the parses.
Figure 5: Per-example probabilities derived from humans (left) and cg-16B (right) on LFs. Examples are sorted by probability. Human probabilities vary according to vocabulary choice, but model probabilities generally do not.
...and 2 more figures

Zero and Few-shot Semantic Parsing with Ambiguous Inputs

TL;DR

Abstract

Zero and Few-shot Semantic Parsing with Ambiguous Inputs

Authors

TL;DR

Abstract

Table of Contents

Figures (7)