"Mm, Wat?" Detecting Other-initiated Repair Requests in Dialogue

Anh Ngo; Nicolas Rollet; Catherine Pelachaud; Chloe Clavel

"Mm, Wat?" Detecting Other-initiated Repair Requests in Dialogue

Anh Ngo, Nicolas Rollet, Catherine Pelachaud, Chloe Clavel

TL;DR

This work tackles detecting Other-Initiated Repair (OIR) repair initiation in Dutch dialogue by building a multimodal model that fuses linguistic and prosodic cues grounded in Conversation Analysis. The approach combines RobBERT-based text embeddings, Whisper-based audio representations, and handcrafted linguistic and prosodic features, augmented with dialogue micro context and a multihead attention fusion mechanism. Empirical results show that multimodal models outperform unimodal variants, with handcrafted features offering strong interpretability and SHAP analyses highlighting key prosodic and linguistic markers such as pauses, intensity, HNR, and coreference patterns; micro-context further enhances detection performance. The findings underscore the complementary roles of how something is said and what is said in signaling repair initiation, and point to practical paths for improving conversational agents’ repair initiation capabilities, including future expansion to visual cues and multilingual corpora.

Abstract

Maintaining mutual understanding is a key component in human-human conversation to avoid conversation breakdowns, in which repair, particularly Other-Initiated Repair (OIR, when one speaker signals trouble and prompts the other to resolve), plays a vital role. However, Conversational Agents (CAs) still fail to recognize user repair initiation, leading to breakdowns or disengagement. This work proposes a multimodal model to automatically detect repair initiation in Dutch dialogues by integrating linguistic and prosodic features grounded in Conversation Analysis. The results show that prosodic cues complement linguistic features and significantly improve the results of pretrained text and audio embeddings, offering insights into how different features interact. Future directions include incorporating visual cues, exploring multilingual and cross-context corpora to assess the robustness and generalizability.

"Mm, Wat?" Detecting Other-initiated Repair Requests in Dialogue

TL;DR

Abstract

"Mm, Wat?" Detecting Other-initiated Repair Requests in Dialogue

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (9)

Theorems & Definitions (3)