Table of Contents
Fetching ...

"Mm, Wat?" Detecting Other-initiated Repair Requests in Dialogue

Anh Ngo, Nicolas Rollet, Catherine Pelachaud, Chloe Clavel

TL;DR

This work tackles detecting Other-Initiated Repair (OIR) repair initiation in Dutch dialogue by building a multimodal model that fuses linguistic and prosodic cues grounded in Conversation Analysis. The approach combines RobBERT-based text embeddings, Whisper-based audio representations, and handcrafted linguistic and prosodic features, augmented with dialogue micro context and a multihead attention fusion mechanism. Empirical results show that multimodal models outperform unimodal variants, with handcrafted features offering strong interpretability and SHAP analyses highlighting key prosodic and linguistic markers such as pauses, intensity, HNR, and coreference patterns; micro-context further enhances detection performance. The findings underscore the complementary roles of how something is said and what is said in signaling repair initiation, and point to practical paths for improving conversational agents’ repair initiation capabilities, including future expansion to visual cues and multilingual corpora.

Abstract

Maintaining mutual understanding is a key component in human-human conversation to avoid conversation breakdowns, in which repair, particularly Other-Initiated Repair (OIR, when one speaker signals trouble and prompts the other to resolve), plays a vital role. However, Conversational Agents (CAs) still fail to recognize user repair initiation, leading to breakdowns or disengagement. This work proposes a multimodal model to automatically detect repair initiation in Dutch dialogues by integrating linguistic and prosodic features grounded in Conversation Analysis. The results show that prosodic cues complement linguistic features and significantly improve the results of pretrained text and audio embeddings, offering insights into how different features interact. Future directions include incorporating visual cues, exploring multilingual and cross-context corpora to assess the robustness and generalizability.

"Mm, Wat?" Detecting Other-initiated Repair Requests in Dialogue

TL;DR

This work tackles detecting Other-Initiated Repair (OIR) repair initiation in Dutch dialogue by building a multimodal model that fuses linguistic and prosodic cues grounded in Conversation Analysis. The approach combines RobBERT-based text embeddings, Whisper-based audio representations, and handcrafted linguistic and prosodic features, augmented with dialogue micro context and a multihead attention fusion mechanism. Empirical results show that multimodal models outperform unimodal variants, with handcrafted features offering strong interpretability and SHAP analyses highlighting key prosodic and linguistic markers such as pauses, intensity, HNR, and coreference patterns; micro-context further enhances detection performance. The findings underscore the complementary roles of how something is said and what is said in signaling repair initiation, and point to practical paths for improving conversational agents’ repair initiation capabilities, including future expansion to visual cues and multilingual corpora.

Abstract

Maintaining mutual understanding is a key component in human-human conversation to avoid conversation breakdowns, in which repair, particularly Other-Initiated Repair (OIR, when one speaker signals trouble and prompts the other to resolve), plays a vital role. However, Conversational Agents (CAs) still fail to recognize user repair initiation, leading to breakdowns or disengagement. This work proposes a multimodal model to automatically detect repair initiation in Dutch dialogues by integrating linguistic and prosodic features grounded in Conversation Analysis. The results show that prosodic cues complement linguistic features and significantly improve the results of pretrained text and audio embeddings, offering insights into how different features interact. Future directions include incorporating visual cues, exploring multilingual and cross-context corpora to assess the robustness and generalizability.

Paper Structure

This paper contains 38 sections, 9 figures, 5 tables.

Figures (9)

  • Figure 1: Other-initiated Repair (OIR) sequence example from Rasenberg2022, English translated: repair initiation (green) signals trouble of ambiguous object reference disc with candidate understanding horizontally, confirmed by repair solution yes horizontally.
  • Figure 2: OIR sequence organization between 2 speakers A (green) and B (red): (a) Minimal; (b) Non-minimal
  • Figure 3: Multimodal architecture for repair initiation detection
  • Figure 4: Handcrafted linguistic and prosodic features design
  • Figure 5: The top 10 most important handcrafted features ranked by SHAP value. Appendix \ref{['sec:appendix_feature_important']} provides the full list of the 20 most contributed features.
  • ...and 4 more figures

Theorems & Definitions (3)

  • Example 1
  • Example 2
  • Example 3