MF-GCN: A Multi-Frequency Graph Convolutional Network for Tri-Modal Depression Detection Using Eye-Tracking, Facial, and Acoustic Features
Sejuti Rahman, Swakshar Deb, MD. Sameer Iqbal Chowdhury, MD. Jubair Ahmed Sourov, Mohammad Shamsuddin
TL;DR
This work tackles the challenge of objective depression detection by coupling eye-tracking, audio, and video into a novel gold-standard dataset and introducing MF-GCN with a Multi-Frequency Filter Bank Module (MFFBM). The method comprises unimodal feature extractors for each modality and a cross-modal graph neural network that learns from both low- and high-frequency spectral information, yielding cross-modal representations that improve classification. Empirically, MF-GCN achieves 96% sensitivity and a 0.94 F2-score in binary depression detection, 0.79 sensitivity and 0.87 specificity in three-class classification, and strong generalization on the CMDC dataset (0.95 sensitivity, 0.96 F2). Theoretical analysis shows MFFB can realize arbitrary spectral filters, addressing fixed low-pass limitations and underscoring the method’s robustness for multimodal mental-health assessment.
Abstract
Depression is a prevalent global mental health disorder, characterised by persistent low mood and anhedonia. However, it remains underdiagnosed because current diagnostic methods depend heavily on subjective clinical assessments. To enable objective detection, we introduce a gold standard dataset of 103 clinically assessed participants collected through a tripartite data approach which uniquely integrated eye tracking data with audio and video to give a comprehensive representation of depressive symptoms. Eye tracking data quantifies the attentional bias towards negative stimuli that is frequently observed in depressed groups. Audio and video data capture the affective flattening and psychomotor retardation characteristic of depression. Statistical validation confirmed their significant discriminative power in distinguishing depressed from non depressed groups. We address a critical limitation of existing graph-based models that focus on low-frequency information and propose a Multi-Frequency Graph Convolutional Network (MF-GCN). This framework consists of a novel Multi-Frequency Filter Bank Module (MFFBM), which can leverage both low and high frequency signals. Extensive evaluation against traditional machine learning algorithms and deep learning frameworks demonstrates that MF-GCN consistently outperforms baselines. In binary classification, the model achieved a sensitivity of 0.96 and F2 score of 0.94. For the 3 class classification task, the proposed method achieved a sensitivity of 0.79 and specificity of 0.87 and siginificantly suprassed other models. To validate generalizability, the model was also evaluated on the Chinese Multimodal Depression Corpus (CMDC) dataset and achieved a sensitivity of 0.95 and F2 score of 0.96. These results confirm that our trimodal, multi frequency framework effectively captures cross modal interaction for accurate depression detection.
