Beyond Questionnaires: Video Analysis for Social Anxiety Detection
Nilesh Kumar Sahu, Nandigramam Sai Harshit, Rishabh Uikey, Haroon R. Lone
TL;DR
This study investigates video-based detection of Social Anxiety Disorder (SAD) using low-cost smartphone video to extract head pose, body pose, facial action units, and eye gaze during impromptu speech under a Trier Social Stress Test. The authors compare classical ML and deep learning methods on summarized features, sequential eye gaze features, and hybrid architectures, reporting the best performance of 83% accuracy with a 1D-CNN-DNN fusion model on AU+Eye Gaze+Body Pose. They introduce a public dataset, our-video-dataset, comprising 92 participants and 121 non-redundant features, enabling benchmarking for SAD detection from visual cues. The findings highlight eye gaze as a strong single predictor and demonstrate that deep learning and feature fusion offer substantial gains, suggesting a scalable, non-invasive tool for early SAD screening in real-world settings.
Abstract
Social Anxiety Disorder (SAD) significantly impacts individuals' daily lives and relationships. The conventional methods for SAD detection involve physical consultations and self-reported questionnaires, but they have limitations such as time consumption and bias. This paper introduces video analysis as a promising method for early SAD detection. Specifically, we present a new approach for detecting SAD in individuals from various bodily features extracted from the video data. We conducted a study to collect video data of 92 participants performing impromptu speech in a controlled environment. Using the video data, we studied the behavioral change in participants' head, body, eye gaze, and action units. By applying a range of machine learning and deep learning algorithms, we achieved an accuracy rate of up to 74\% in classifying participants as SAD or non-SAD. Video-based SAD detection offers a non-intrusive and scalable approach that can be deployed in real-time, potentially enhancing early detection and intervention capabilities.
