ST-Gait++: Leveraging spatio-temporal convolutions for gait-based emotion recognition on videos
Maria Luísa Lima, Willams de Lima Costa, Estefania Talavera Martinez, Veronica Teichrieb
TL;DR
The paper addresses emotion recognition from gait, arguing that gait provides informative nonverbal cues beyond facial expressions. It proposes ST-Gait++, a skeleton-based spatio-temporal graph convolutional network operating on a 16-joint 3D skeleton with three ST-GCN++ blocks to classify four emotions. On the E-Gait dataset, it achieves approximately $87.5\%$ accuracy, about $5.4$ percentage points above the STEP baseline, and converges roughly $3.63\times$ faster in training. The work also discusses limitations of the dataset and bias/diversity considerations, highlighting practical implications for accessible gait-based emotion analysis and directions for more diverse open datasets.
Abstract
Emotion recognition is relevant for human behaviour understanding, where facial expression and speech recognition have been widely explored by the computer vision community. Literature in the field of behavioural psychology indicates that gait, described as the way a person walks, is an additional indicator of emotions. In this work, we propose a deep framework for emotion recognition through the analysis of gait. More specifically, our model is composed of a sequence of spatial-temporal Graph Convolutional Networks that produce a robust skeleton-based representation for the task of emotion classification. We evaluate our proposed framework on the E-Gait dataset, composed of a total of 2177 samples. The results obtained represent an improvement of approximately 5% in accuracy compared to the state of the art. In addition, during training we observed a faster convergence of our model compared to the state-of-the-art methodologies.
