Table of Contents
Fetching ...

Learning Multimodal Cues of Children's Uncertainty

Qi Cheng, Mert İnan, Rahma Mbarki, Grace Grmek, Theresa Choi, Yiming Sun, Kimele Persaud, Jenny Wang, Malihe Alikhani

TL;DR

A multimodal machine learning model that can predict uncertainty given a real-time video clip of a participant is presented, which improves upon a baseline multimodAL transformer model and has broad implications for gesture understanding and generation.

Abstract

Understanding uncertainty plays a critical role in achieving common ground (Clark et al.,1983). This is especially important for multimodal AI systems that collaborate with users to solve a problem or guide the user through a challenging concept. In this work, for the first time, we present a dataset annotated in collaboration with developmental and cognitive psychologists for the purpose of studying nonverbal cues of uncertainty. We then present an analysis of the data, studying different roles of uncertainty and its relationship with task difficulty and performance. Lastly, we present a multimodal machine learning model that can predict uncertainty given a real-time video clip of a participant, which we find improves upon a baseline multimodal transformer model. This work informs research on cognitive coordination between human-human and human-AI and has broad implications for gesture understanding and generation. The anonymized version of our data and code will be publicly available upon the completion of the required consent forms and data sheets.

Learning Multimodal Cues of Children's Uncertainty

TL;DR

A multimodal machine learning model that can predict uncertainty given a real-time video clip of a participant is presented, which improves upon a baseline multimodAL transformer model and has broad implications for gesture understanding and generation.

Abstract

Understanding uncertainty plays a critical role in achieving common ground (Clark et al.,1983). This is especially important for multimodal AI systems that collaborate with users to solve a problem or guide the user through a challenging concept. In this work, for the first time, we present a dataset annotated in collaboration with developmental and cognitive psychologists for the purpose of studying nonverbal cues of uncertainty. We then present an analysis of the data, studying different roles of uncertainty and its relationship with task difficulty and performance. Lastly, we present a multimodal machine learning model that can predict uncertainty given a real-time video clip of a participant, which we find improves upon a baseline multimodal transformer model. This work informs research on cognitive coordination between human-human and human-AI and has broad implications for gesture understanding and generation. The anonymized version of our data and code will be publicly available upon the completion of the required consent forms and data sheets.

Paper Structure

This paper contains 22 sections, 2 equations, 8 figures, 5 tables.

Figures (8)

  • Figure 1: A diagram of our multimodal machine learning model. After identifying uncertainty cues in the multimodal transformer, the model passes the cues onto a final multilayer perceptron classifier to output whether the child is expressing uncertainty or not.
  • Figure 2: Schematic of experimental procedure depicting the Easy-First condition on the left and the Hard-First condition on the right. As time progresses throughout the task, the trials advance from easier ratios (2.0) to hard ratios (1.11) in the Easy-First condition. Whereas in the Hard-First condition, trials move in reverse order from hard ratios (1.11) to easy ratios (2.0) as time progresses.
  • Figure 3: The distribution of uncertain trials with task difficulty on a scale of 1 (easiest) to 30 (hardest). Uncertainty shows a strong correlation with task difficulty ($r(58)=-.927, p < .01$).
  • Figure 4: Examples of (from left to right) eyebrow raise, eyebrow scrunch, hand on face, funny face, and smile
  • Figure 5: Distribution of uncertainty cues across all/uncertain, difficult/easy. We can see that delay, eyebrow raise, and eyebrow scrunch are significantly more frequent in uncertain trials.
  • ...and 3 more figures