Table of Contents
Fetching ...

Reasoning About Intent for Ambiguous Requests

Irina Saparina, Mirella Lapata

TL;DR

The paper tackles ambiguity in user requests for large language models by proposing a single-turn approach that generates multiple interpretation–answer pairs. It trains models with reinforcement learning (DAPO) using recall for ambiguous cases and precision for unambiguous ones, ensuring broad coverage without sacrificing correctness. Across Abg-CoQA (conversational QA) and Ambrosia (text-to-SQL), the method achieves higher interpretation coverage and strong alignment between interpretations and answers, with favorable human judgments. This approach enhances transparency, efficiency, and downstream applicability, enabling safer and more user-aligned interactions in real-world systems.

Abstract

Large language models often respond to ambiguous requests by implicitly committing to one interpretation. Intent misunderstandings can frustrate users and create safety risks. To address this, we propose generating multiple interpretation-answer pairs in a single structured response to ambiguous requests. Our models are trained with reinforcement learning and customized reward functions using multiple valid answers as supervision. Experiments on conversational question answering and semantic parsing demonstrate that our method achieves higher coverage of valid answers than baseline approaches. Human evaluation confirms that predicted interpretations are highly aligned with their answers. Our approach promotes transparency with explicit interpretations, achieves efficiency by requiring only one generation step, and supports downstream applications through its structured output format.

Reasoning About Intent for Ambiguous Requests

TL;DR

The paper tackles ambiguity in user requests for large language models by proposing a single-turn approach that generates multiple interpretation–answer pairs. It trains models with reinforcement learning (DAPO) using recall for ambiguous cases and precision for unambiguous ones, ensuring broad coverage without sacrificing correctness. Across Abg-CoQA (conversational QA) and Ambrosia (text-to-SQL), the method achieves higher interpretation coverage and strong alignment between interpretations and answers, with favorable human judgments. This approach enhances transparency, efficiency, and downstream applicability, enabling safer and more user-aligned interactions in real-world systems.

Abstract

Large language models often respond to ambiguous requests by implicitly committing to one interpretation. Intent misunderstandings can frustrate users and create safety risks. To address this, we propose generating multiple interpretation-answer pairs in a single structured response to ambiguous requests. Our models are trained with reinforcement learning and customized reward functions using multiple valid answers as supervision. Experiments on conversational question answering and semantic parsing demonstrate that our method achieves higher coverage of valid answers than baseline approaches. Human evaluation confirms that predicted interpretations are highly aligned with their answers. Our approach promotes transparency with explicit interpretations, achieves efficiency by requiring only one generation step, and supports downstream applications through its structured output format.

Paper Structure

This paper contains 42 sections, 5 equations, 1 figure, 10 tables.

Figures (1)

  • Figure 1: Reasoning length (number of characters) vs. coverage (ambiguous subsets).