Table of Contents
Fetching ...

Nonverbal Immediacy Analysis in Education: A Multimodal Computational Model

Uroš Petković, Jonas Frenkel, Olaf Hellwich, Rebecca Lazarides

TL;DR

This work tackles the challenge of quantifying teachers\' nonverbal immediacy (NVI) in classrooms using RGB video by integrating facial expressions, gesture intensity, and perceived distance. It introduces a multimodal pipeline with tracking/segmentation, a ResNet$18$-based gesture intensity regressor, a depth-aware perceived distance regressor, facial expression analysis via HSEmotion, and an NVI fusion model implemented as a three-layer MLP. On a TALIS-derived German classroom dataset (400 clips, 46 teachers; 3056 labeled frames from 3 raters), the gesture intensity regressor achieves $r=0.84$, the distance regressor $r=0.55$, and the NVI model $r=0.44$ relative to human ratings, with ICC$(2,3)$ values around $0.61$–$0.69$ when combining model and human raters. External validation shows moderate correlations between model-derived NVI and student outcomes (e.g., interest in mathematics, cognitive activation, perceived enthusiasm, socio-emotional support), supporting the method’s potential to augment human observer assessments. Overall, the study provides a foundational, RGB-only framework for scalable, multimodal analysis of teacher nonverbal behavior and opens avenues for richer educational insights and interventions.

Abstract

This paper introduces a novel computational approach for analyzing nonverbal social behavior in educational settings. Integrating multimodal behavioral cues, including facial expressions, gesture intensity, and spatial dynamics, the model assesses the nonverbal immediacy (NVI) of teachers from RGB classroom videos. A dataset of 400 30-second video segments from German classrooms was constructed for model training and validation. The gesture intensity regressor achieved a correlation of 0.84, the perceived distance regressor 0.55, and the NVI model 0.44 with median human ratings. The model demonstrates the potential to provide a valuable support in nonverbal behavior assessment, approximating the accuracy of individual human raters. Validated against both questionnaire data and trained observer ratings, our models show moderate to strong correlations with relevant educational outcomes, indicating their efficacy in reflecting effective teaching behaviors. This research advances the objective assessment of nonverbal communication behaviors, opening new pathways for educational research.

Nonverbal Immediacy Analysis in Education: A Multimodal Computational Model

TL;DR

This work tackles the challenge of quantifying teachers\' nonverbal immediacy (NVI) in classrooms using RGB video by integrating facial expressions, gesture intensity, and perceived distance. It introduces a multimodal pipeline with tracking/segmentation, a ResNet-based gesture intensity regressor, a depth-aware perceived distance regressor, facial expression analysis via HSEmotion, and an NVI fusion model implemented as a three-layer MLP. On a TALIS-derived German classroom dataset (400 clips, 46 teachers; 3056 labeled frames from 3 raters), the gesture intensity regressor achieves , the distance regressor , and the NVI model relative to human ratings, with ICC values around when combining model and human raters. External validation shows moderate correlations between model-derived NVI and student outcomes (e.g., interest in mathematics, cognitive activation, perceived enthusiasm, socio-emotional support), supporting the method’s potential to augment human observer assessments. Overall, the study provides a foundational, RGB-only framework for scalable, multimodal analysis of teacher nonverbal behavior and opens avenues for richer educational insights and interventions.

Abstract

This paper introduces a novel computational approach for analyzing nonverbal social behavior in educational settings. Integrating multimodal behavioral cues, including facial expressions, gesture intensity, and spatial dynamics, the model assesses the nonverbal immediacy (NVI) of teachers from RGB classroom videos. A dataset of 400 30-second video segments from German classrooms was constructed for model training and validation. The gesture intensity regressor achieved a correlation of 0.84, the perceived distance regressor 0.55, and the NVI model 0.44 with median human ratings. The model demonstrates the potential to provide a valuable support in nonverbal behavior assessment, approximating the accuracy of individual human raters. Validated against both questionnaire data and trained observer ratings, our models show moderate to strong correlations with relevant educational outcomes, indicating their efficacy in reflecting effective teaching behaviors. This research advances the objective assessment of nonverbal communication behaviors, opening new pathways for educational research.
Paper Structure (18 sections, 3 figures, 1 table)

This paper contains 18 sections, 3 figures, 1 table.

Figures (3)

  • Figure 1: Examples of different gesture intensities with similar extracted skeletons. In \ref{['fig:hand_gesture']}, the person is pointing, representing a specific hand gesture. In \ref{['fig:non_hand_gesture']}, the person is leaning on the whiteboard, which is not considered gesturing.
  • Figure 2: Histograms showing the distribution of ratings for perceived distance, gesture intensity, and nonverbal immediacy.
  • Figure 3: Pipeline overview: The model takes RGB videos as input, performing tracking, segmentation, and depth estimation. We extract facial expressions and gesture intensity from the teacher's segmentation masks. To estimate perceived distance, we use estimated depth images and segmentation masks from both the teacher and students. These features - facial expressions, gesture intensity, and perceived distance - converge in the NVI model to calculate the NVI score.