Table of Contents
Fetching ...

Toward Scalable Co-located Practical Learning: Assisting with Computer Vision and Multimodal Analytics

Xinyu Li, Linxuan Zhao, Roberto Martinez-Maldonado, Dragan Gasevic, Lixiang Yan

Abstract

This study examined whether a single ceiling-mounted camera could be used to capture fine-grained learning behaviours in co-located practical learning. In undergraduate nursing simulations, teachers first identified seven observable behaviour categories, which were then used to train a YOLO-based detector. Video data were collected from 52 sessions, and analyses focused on Scenario A because it produced greater behavioural variation than Scenario B. Annotation reliability was high (F1=0.933). On the held-out test set, the model achieved a precision of 0.789, a recall of 0.784, and an mAP@0.5 of 0.827. When only behaviour frequencies were compared, no robust differences were found between high- and low-performing groups. However, when behaviour labels were analysed together with spatial context, clear differences emerged in both task and collaboration performance. Higher-performing teams showed more patient interaction in the primary work area, whereas lower-performing teams showed more phone-related activity and more activity in secondary areas. These findings suggest that behavioural data are more informative when interpreted together with where they occur. Overall, the study shows that a single-camera computer vision approach can support the analysis of teamwork and task engagement in face-to-face practical learning without relying on wearable sensors.

Toward Scalable Co-located Practical Learning: Assisting with Computer Vision and Multimodal Analytics

Abstract

This study examined whether a single ceiling-mounted camera could be used to capture fine-grained learning behaviours in co-located practical learning. In undergraduate nursing simulations, teachers first identified seven observable behaviour categories, which were then used to train a YOLO-based detector. Video data were collected from 52 sessions, and analyses focused on Scenario A because it produced greater behavioural variation than Scenario B. Annotation reliability was high (F1=0.933). On the held-out test set, the model achieved a precision of 0.789, a recall of 0.784, and an mAP@0.5 of 0.827. When only behaviour frequencies were compared, no robust differences were found between high- and low-performing groups. However, when behaviour labels were analysed together with spatial context, clear differences emerged in both task and collaboration performance. Higher-performing teams showed more patient interaction in the primary work area, whereas lower-performing teams showed more phone-related activity and more activity in secondary areas. These findings suggest that behavioural data are more informative when interpreted together with where they occur. Overall, the study shows that a single-camera computer vision approach can support the analysis of teamwork and task engagement in face-to-face practical learning without relying on wearable sensors.
Paper Structure (31 sections, 11 figures, 5 tables)

This paper contains 31 sections, 11 figures, 5 tables.

Figures (11)

  • Figure 1: Computer vision approach for learning behaviours detection
  • Figure 2: Teams of four students and a teacher playing the role of the patient in Bed 3’s family member in the specialised classroom space. The points and labels represent the centre of different spaces of interest. The tracking boxes identify individuals and learning actions happened in the simulation classroom. The black area inside the tracking boxes protects individuals’ facial identities.
  • Figure 3: Model Training Results
  • Figure 4: Confusion Matrix (Normalised)
  • Figure 5: Precision Recall Curve
  • ...and 6 more figures