Are We Asking the Right Questions? On Ambiguity in Natural Language Queries for Tabular Data Analysis
Daniel Gomm, Cornelius Wolff, Madelon Hulsebos
TL;DR
Ambiguity in natural language queries for tabular data analysis is treated not as a defect but as a signal of user intent and division of labor in query grounding. The authors propose a cooperative_query_framework that distinguishes unambiguous, cooperative, and uncooperative queries, where the user and system share responsibility for grounding both the analytical procedure and the data scope. Analyzing 15 benchmarks reveals widespread data_privileged and underspecified queries, challenging traditional evaluation of execution alone and highlighting the need to separate interpretation capabilities. The paper advocates stratified evaluation, annotated datasets with grounding levels, and iterative grounding datasets, arguing for cooperative system designs that disclose grounding choices and support clarification to advance open_domain tabular data analysis.
Abstract
Natural language interfaces to tabular data must handle ambiguities inherent to queries. Instead of treating ambiguity as a deficiency, we reframe it as a feature of cooperative interaction where users are intentional about the degree to which they specify queries. We develop a principled framework based on a shared responsibility of query specification between user and system, distinguishing unambiguous and ambiguous cooperative queries, which systems can resolve through reasonable inference, from uncooperative queries that cannot be resolved. Applying the framework to evaluations for tabular question answering and analysis, we analyze the queries in 15 popular datasets, and observe an uncontrolled mixing of query types neither adequate for evaluating a system's execution accuracy nor for evaluating interpretation capabilities. This conceptualization around cooperation in resolving queries informs how to design and evaluate natural language interfaces for tabular data analysis, for which we distill concrete directions for future research and broader implications.
