Table of Contents
Fetching ...

MF-GCN: A Multi-Frequency Graph Convolutional Network for Tri-Modal Depression Detection Using Eye-Tracking, Facial, and Acoustic Features

Sejuti Rahman, Swakshar Deb, MD. Sameer Iqbal Chowdhury, MD. Jubair Ahmed Sourov, Mohammad Shamsuddin

TL;DR

This work tackles the challenge of objective depression detection by coupling eye-tracking, audio, and video into a novel gold-standard dataset and introducing MF-GCN with a Multi-Frequency Filter Bank Module (MFFBM). The method comprises unimodal feature extractors for each modality and a cross-modal graph neural network that learns from both low- and high-frequency spectral information, yielding cross-modal representations that improve classification. Empirically, MF-GCN achieves 96% sensitivity and a 0.94 F2-score in binary depression detection, 0.79 sensitivity and 0.87 specificity in three-class classification, and strong generalization on the CMDC dataset (0.95 sensitivity, 0.96 F2). Theoretical analysis shows MFFB can realize arbitrary spectral filters, addressing fixed low-pass limitations and underscoring the method’s robustness for multimodal mental-health assessment.

Abstract

Depression is a prevalent global mental health disorder, characterised by persistent low mood and anhedonia. However, it remains underdiagnosed because current diagnostic methods depend heavily on subjective clinical assessments. To enable objective detection, we introduce a gold standard dataset of 103 clinically assessed participants collected through a tripartite data approach which uniquely integrated eye tracking data with audio and video to give a comprehensive representation of depressive symptoms. Eye tracking data quantifies the attentional bias towards negative stimuli that is frequently observed in depressed groups. Audio and video data capture the affective flattening and psychomotor retardation characteristic of depression. Statistical validation confirmed their significant discriminative power in distinguishing depressed from non depressed groups. We address a critical limitation of existing graph-based models that focus on low-frequency information and propose a Multi-Frequency Graph Convolutional Network (MF-GCN). This framework consists of a novel Multi-Frequency Filter Bank Module (MFFBM), which can leverage both low and high frequency signals. Extensive evaluation against traditional machine learning algorithms and deep learning frameworks demonstrates that MF-GCN consistently outperforms baselines. In binary classification, the model achieved a sensitivity of 0.96 and F2 score of 0.94. For the 3 class classification task, the proposed method achieved a sensitivity of 0.79 and specificity of 0.87 and siginificantly suprassed other models. To validate generalizability, the model was also evaluated on the Chinese Multimodal Depression Corpus (CMDC) dataset and achieved a sensitivity of 0.95 and F2 score of 0.96. These results confirm that our trimodal, multi frequency framework effectively captures cross modal interaction for accurate depression detection.

MF-GCN: A Multi-Frequency Graph Convolutional Network for Tri-Modal Depression Detection Using Eye-Tracking, Facial, and Acoustic Features

TL;DR

This work tackles the challenge of objective depression detection by coupling eye-tracking, audio, and video into a novel gold-standard dataset and introducing MF-GCN with a Multi-Frequency Filter Bank Module (MFFBM). The method comprises unimodal feature extractors for each modality and a cross-modal graph neural network that learns from both low- and high-frequency spectral information, yielding cross-modal representations that improve classification. Empirically, MF-GCN achieves 96% sensitivity and a 0.94 F2-score in binary depression detection, 0.79 sensitivity and 0.87 specificity in three-class classification, and strong generalization on the CMDC dataset (0.95 sensitivity, 0.96 F2). Theoretical analysis shows MFFB can realize arbitrary spectral filters, addressing fixed low-pass limitations and underscoring the method’s robustness for multimodal mental-health assessment.

Abstract

Depression is a prevalent global mental health disorder, characterised by persistent low mood and anhedonia. However, it remains underdiagnosed because current diagnostic methods depend heavily on subjective clinical assessments. To enable objective detection, we introduce a gold standard dataset of 103 clinically assessed participants collected through a tripartite data approach which uniquely integrated eye tracking data with audio and video to give a comprehensive representation of depressive symptoms. Eye tracking data quantifies the attentional bias towards negative stimuli that is frequently observed in depressed groups. Audio and video data capture the affective flattening and psychomotor retardation characteristic of depression. Statistical validation confirmed their significant discriminative power in distinguishing depressed from non depressed groups. We address a critical limitation of existing graph-based models that focus on low-frequency information and propose a Multi-Frequency Graph Convolutional Network (MF-GCN). This framework consists of a novel Multi-Frequency Filter Bank Module (MFFBM), which can leverage both low and high frequency signals. Extensive evaluation against traditional machine learning algorithms and deep learning frameworks demonstrates that MF-GCN consistently outperforms baselines. In binary classification, the model achieved a sensitivity of 0.96 and F2 score of 0.94. For the 3 class classification task, the proposed method achieved a sensitivity of 0.79 and specificity of 0.87 and siginificantly suprassed other models. To validate generalizability, the model was also evaluated on the Chinese Multimodal Depression Corpus (CMDC) dataset and achieved a sensitivity of 0.95 and F2 score of 0.96. These results confirm that our trimodal, multi frequency framework effectively captures cross modal interaction for accurate depression detection.

Paper Structure

This paper contains 29 sections, 3 theorems, 19 equations, 10 figures, 7 tables.

Key Result

Lemma 1

The spectral behavior of a graph convolution kernel $\mathbf{C}$ can be described by its frequency response in the spectral domain, given by: where $\mathcal{F}(\lambda)$ represents the frequency response, $\mathbf{U}$ is the matrix of eigenvectors of the normalized graph Laplacian $\mathcal{L}$, and $\mathbf{C}$ is the graph convolution operator.

Figures (10)

  • Figure 1: Histogram illustrating the distribution of Patient Health Questionnaire-9 (PHQ-9) scores among subjects
  • Figure 2: Violinplot illustrating the distribution of Patient Health Questionnaire-9 (PHQ-9) scores between male and female subjects.
  • Figure 3: Illustration of the overall workflow for the data collection conducted at the two collection sites: The National Institute of Mental Health (NIMH) and the University of Dhaka (DU). The data are collected, cleaned for anomalies and missing data, annotated, and separated into the three classes.
  • Figure 4: Demographic characteristics and PHQ-9 score distribution (N=105)
  • Figure 4: The three combinations of images shown to the subjects: (top-left) Sad-Neutral, (top-right) Neutral-Happy, (bottom) and Happy-Sad.
  • ...and 5 more figures

Theorems & Definitions (4)

  • Lemma 1
  • Lemma 2
  • Theorem 1
  • proof