Table of Contents
Fetching ...

A Taxonomy of Ambiguity Types for NLP

Margaret Y. Li, Alisa Liu, Zhaofeng Wu, Noah A. Smith

TL;DR

Ambiguity is essential to human language but underexplored in NLP. The paper introduces an eleven-type taxonomy of English ambiguity to enable fine-grained data analysis and model evaluation, and outlines plans to annotate and rebalance the AmbiEnt benchmark to study type-specific model performance. The approach emphasizes that different ambiguity types require distinct resolution strategies and can reveal distinct model weaknesses. By enabling targeted tasks and type-aware benchmarks, the work aims to improve NLP systems' robustness in handling real-world ambiguity.

Abstract

Ambiguity is an critical component of language that allows for more effective communication between speakers, but is often ignored in NLP. Recent work suggests that NLP systems may struggle to grasp certain elements of human language understanding because they may not handle ambiguities at the level that humans naturally do in communication. Additionally, different types of ambiguity may serve different purposes and require different approaches for resolution, and we aim to investigate how language models' abilities vary across types. We propose a taxonomy of ambiguity types as seen in English to facilitate NLP analysis. Our taxonomy can help make meaningful splits in language ambiguity data, allowing for more fine-grained assessments of both datasets and model performance.

A Taxonomy of Ambiguity Types for NLP

TL;DR

Ambiguity is essential to human language but underexplored in NLP. The paper introduces an eleven-type taxonomy of English ambiguity to enable fine-grained data analysis and model evaluation, and outlines plans to annotate and rebalance the AmbiEnt benchmark to study type-specific model performance. The approach emphasizes that different ambiguity types require distinct resolution strategies and can reveal distinct model weaknesses. By enabling targeted tasks and type-aware benchmarks, the work aims to improve NLP systems' robustness in handling real-world ambiguity.

Abstract

Ambiguity is an critical component of language that allows for more effective communication between speakers, but is often ignored in NLP. Recent work suggests that NLP systems may struggle to grasp certain elements of human language understanding because they may not handle ambiguities at the level that humans naturally do in communication. Additionally, different types of ambiguity may serve different purposes and require different approaches for resolution, and we aim to investigate how language models' abilities vary across types. We propose a taxonomy of ambiguity types as seen in English to facilitate NLP analysis. Our taxonomy can help make meaningful splits in language ambiguity data, allowing for more fine-grained assessments of both datasets and model performance.
Paper Structure (16 sections, 1 table)