Taking Action Towards Graceful Interaction: The Effects of Performing Actions on Modelling Policies for Instruction Clarification Requests

Brielen Madureira; David Schlangen

Taking Action Towards Graceful Interaction: The Effects of Performing Actions on Modelling Policies for Instruction Clarification Requests

Brielen Madureira, David Schlangen

TL;DR

The paper addresses instruction clarification in multimodal dialogue by formalizing two iCR tasks: $f_{when}: s \mapsto [0,1]$ and $f_{what}: (o_i,s) \mapsto [0,1]$, and tests whether action-taking as an auxiliary objective improves iCR policy learning. A Transformer-based multimodal model for CoDraw compares Overhearer, Action-Taker, and pretrained iCR-Action-Taker variants to test three hypotheses about action-taking and uncertainty signals. Results show that action-taking yields limited improvements for when-to-ask, while uncertainty signals exist but are not sufficiently predictive on their own; predicting what to ask benefits more from action-informed representations. The work highlights the limits of data-driven meta-communication learning and suggests reinforcement learning and richer evaluation frameworks to better capture effective iCR policies in dynamic settings.

Abstract

Clarification requests are a mechanism to help solve communication problems, e.g. due to ambiguity or underspecification, in instruction-following interactions. Despite their importance, even skilful models struggle with producing or interpreting such repair acts. In this work, we test three hypotheses concerning the effects of action taking as an auxiliary task in modelling iCR policies. Contrary to initial expectations, we conclude that its contribution to learning an iCR policy is limited, but some information can still be extracted from prediction uncertainty. We present further evidence that even well-motivated, Transformer-based models fail to learn good policies for when to ask Instruction CRs (iCRs), while the task of determining what to ask about can be more successfully modelled. Considering the implications of these findings, we further discuss the shortcomings of the data-driven paradigm for learning meta-communication acts.

Taking Action Towards Graceful Interaction: The Effects of Performing Actions on Modelling Policies for Instruction Clarification Requests

TL;DR

The paper addresses instruction clarification in multimodal dialogue by formalizing two iCR tasks:

and

, and tests whether action-taking as an auxiliary objective improves iCR policy learning. A Transformer-based multimodal model for CoDraw compares Overhearer, Action-Taker, and pretrained iCR-Action-Taker variants to test three hypotheses about action-taking and uncertainty signals. Results show that action-taking yields limited improvements for when-to-ask, while uncertainty signals exist but are not sufficiently predictive on their own; predicting what to ask benefits more from action-informed representations. The work highlights the limits of data-driven meta-communication learning and suggests reinforcement learning and richer evaluation frameworks to better capture effective iCR policies in dynamic settings.

Abstract

Paper Structure (35 sections, 9 figures, 5 tables)

This paper contains 35 sections, 9 figures, 5 tables.

Introduction
Contributions
Related Work
Learning when to ask questions
Modelling clarification requests
Evaluating CR mechanisms in dialogue models
Definitions
Task 1
Task 2
Hypotheses
Models
Experiments
Implementation
Evaluation metrics
Results
...and 20 more sections

Figures (9)

Figure 1: Clarification requests posed by an instruction follower, demonstrating uncertainty on deciding what actions to take due to ambiguity or underspecification. From: CoDraw dialogue game 8198, CC BY-NC 4.0, cliparts from zitnick2013bringing.
Figure 2: The basic structure of our iCR policy models. The full structure represents the iCR-Action-Taker. The Overhearer contains no action predictor (area shaded in grey), whereas the Action-Taker contains no iCR predictor (area in the dotted box).
Figure 3: Empirical cumulative distribution function of the certainty of taking actions for each clipart (left) and the minimum by turn (right).
Figure 4: Empirical distribution of the number of actions per turn in the CoDraw dataset.
Figure 5: Empirical distribution of the certainty of taking actions for each clipart (top) and the minimum by turn (bottom).
...and 4 more figures

Taking Action Towards Graceful Interaction: The Effects of Performing Actions on Modelling Policies for Instruction Clarification Requests

TL;DR

Abstract

Taking Action Towards Graceful Interaction: The Effects of Performing Actions on Modelling Policies for Instruction Clarification Requests

Authors

TL;DR

Abstract

Table of Contents

Figures (9)