MOCAS: A Multimodal Dataset for Objective Cognitive Workload Assessment on Simultaneous Tasks
Wonse Jo, Ruiqi Wang, Su Sun, Revanth Krishna Senthilkumaran, Daniel Foti, Byung-Cheol Min
TL;DR
MOCAS presents a realistic, multimodal cognitive workload dataset collected from CCTV monitoring tasks with a real multi-robot setup. It integrates physiological signals (EEG, GSR, PPG, HR, SKT, ACC), behavioral data (facial video, EAR, AUs, mouse), and subjective/emotional annotations (NASA-TLX, ISA, SAM) from 21 participants, including Big Five personality traits, to support robust CWL recognition in real-world human–machine systems. The authors validate data quality through correlation analyses and questionnaire results, and demonstrate a baseline three-class CWL classifier using LF-LSTM, achieving 72.3% trial-independent accuracy and 46.1% subject-independent accuracy, with EEG_POW offering strong unimodal performance and multimodal fusion providing the best results. The dataset (RAW ~722.4 GB; 754 ROSbag2) and preprocessing/code are publicly available under controlled access, enabling researchers to benchmark multimodal CWL models and pursue personalization and transfer-learning approaches for real-world deployments.
Abstract
This paper presents MOCAS, a multimodal dataset dedicated for human cognitive workload (CWL) assessment. In contrast to existing datasets based on virtual game stimuli, the data in MOCAS was collected from realistic closed-circuit television (CCTV) monitoring tasks, increasing its applicability for real-world scenarios. To build MOCAS, two off-the-shelf wearable sensors and one webcam were utilized to collect physiological signals and behavioral features from 21 human subjects. After each task, participants reported their CWL by completing the NASA-Task Load Index (NASA-TLX) and Instantaneous Self-Assessment (ISA). Personal background (e.g., personality and prior experience) was surveyed using demographic and Big Five Factor personality questionnaires, and two domains of subjective emotion information (i.e., arousal and valence) were obtained from the Self-Assessment Manikin (SAM), which could serve as potential indicators for improving CWL recognition performance. Technical validation was conducted to demonstrate that target CWL levels were elicited during simultaneous CCTV monitoring tasks; its results support the high quality of the collected multimodal signals.
