Table of Contents
Fetching ...

Predicting States of Understanding in Explanatory Interactions Using Cognitive Load-Related Linguistic Cues

Yu Wang, Olcay Türk, Angela Grimminger, Hendrik Buschmeier

Abstract

We investigate how verbal and nonverbal linguistic features, exhibited by speakers and listeners in dialogue, can contribute to predicting the listener's state of understanding in explanatory interactions on a moment-by-moment basis. Specifically, we examine three linguistic cues related to cognitive load and hypothesised to correlate with listener understanding: the information value (operationalised with surprisal) and syntactic complexity of the speaker's utterances, and the variation in the listener's interactive gaze behaviour. Based on statistical analyses of the MUNDEX corpus of face-to-face dialogic board game explanations, we find that individual cues vary with the listener's level of understanding. Listener states ('Understanding', 'Partial Understanding', 'Non-Understanding' and 'Misunderstanding') were self-annotated by the listeners using a retrospective video-recall method. The results of a subsequent classification experiment, involving two off-the-shelf classifiers and a fine-tuned German BERT-based multimodal classifier, demonstrate that prediction of these four states of understanding is generally possible and improves when the three linguistic cues are considered alongside textual features.

Predicting States of Understanding in Explanatory Interactions Using Cognitive Load-Related Linguistic Cues

Abstract

We investigate how verbal and nonverbal linguistic features, exhibited by speakers and listeners in dialogue, can contribute to predicting the listener's state of understanding in explanatory interactions on a moment-by-moment basis. Specifically, we examine three linguistic cues related to cognitive load and hypothesised to correlate with listener understanding: the information value (operationalised with surprisal) and syntactic complexity of the speaker's utterances, and the variation in the listener's interactive gaze behaviour. Based on statistical analyses of the MUNDEX corpus of face-to-face dialogic board game explanations, we find that individual cues vary with the listener's level of understanding. Listener states ('Understanding', 'Partial Understanding', 'Non-Understanding' and 'Misunderstanding') were self-annotated by the listeners using a retrospective video-recall method. The results of a subsequent classification experiment, involving two off-the-shelf classifiers and a fine-tuned German BERT-based multimodal classifier, demonstrate that prediction of these four states of understanding is generally possible and improves when the three linguistic cues are considered alongside textual features.
Paper Structure (20 sections, 4 equations, 5 figures, 3 tables)

This paper contains 20 sections, 4 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Explanation set-up in the MUNDEX corpus (screenshot from one camera-perspective). The person on the left is the explainer who explains a board game; the person on the right is the explainee.
  • Figure 2: The general quantification pipeline for getting average information value, average gaze entropy, average syntactic complexity score, and average dependency length.
  • Figure 3: Variation of the quantified linguistic cues under different states of understanding (‘U’). Horizontal bars show statistically significant Dunn's post-hoc tests (Bonferroni-corrected, $\alpha = 0.05$).
  • Figure 4: Understanding state classification by fusing linguistic cues to a fine-tuned BERT model. We first fine-tuned a (German) BERT model with the dialogue data from the MUNDEX corpus in order to learn potential textual features related to understanding states. We then focused on the last four hidden layers, fusing them with the three significant linguistic cues identified in Section \ref{['sec:stat']} (average information value, average gaze entropy, syntactic complexity score) .
  • Figure 5: Confusion matrix for the German BERT model for classifying understanding states (‘U’) with textual features and linguistic cues.