Taking Action Towards Graceful Interaction: The Effects of Performing Actions on Modelling Policies for Instruction Clarification Requests
Brielen Madureira, David Schlangen
TL;DR
The paper addresses instruction clarification in multimodal dialogue by formalizing two iCR tasks: $f_{when}: s \mapsto [0,1]$ and $f_{what}: (o_i,s) \mapsto [0,1]$, and tests whether action-taking as an auxiliary objective improves iCR policy learning. A Transformer-based multimodal model for CoDraw compares Overhearer, Action-Taker, and pretrained iCR-Action-Taker variants to test three hypotheses about action-taking and uncertainty signals. Results show that action-taking yields limited improvements for when-to-ask, while uncertainty signals exist but are not sufficiently predictive on their own; predicting what to ask benefits more from action-informed representations. The work highlights the limits of data-driven meta-communication learning and suggests reinforcement learning and richer evaluation frameworks to better capture effective iCR policies in dynamic settings.
Abstract
Clarification requests are a mechanism to help solve communication problems, e.g. due to ambiguity or underspecification, in instruction-following interactions. Despite their importance, even skilful models struggle with producing or interpreting such repair acts. In this work, we test three hypotheses concerning the effects of action taking as an auxiliary task in modelling iCR policies. Contrary to initial expectations, we conclude that its contribution to learning an iCR policy is limited, but some information can still be extracted from prediction uncertainty. We present further evidence that even well-motivated, Transformer-based models fail to learn good policies for when to ask Instruction CRs (iCRs), while the task of determining what to ask about can be more successfully modelled. Considering the implications of these findings, we further discuss the shortcomings of the data-driven paradigm for learning meta-communication acts.
