Enhancing Depression Detection via Question-wise Modality Fusion
Aishik Mandal, Dana Atzil-Slonim, Thamar Solorio, Iryna Gurevych
TL;DR
The paper tackles automated depression severity assessment using multimodal data by introducing QuestMF, a framework that performs question-wise (per PHQ-8 item) modality fusion and outputs per-question scores, addressing the per-question contribution of each modality and the ordinal nature of the labels. It integrates turn-based encoders for text, audio, and video, and uses cross-attention fusion to create a per-question fused representation, trained with Imbalanced Ordinal Log-Loss (ImbOLL) to handle label imbalance. Empirically, QuestMF with ImbOLL achieves performance comparable to state-of-the-art methods on the E-DAIC dataset while significantly enhancing interpretability through per-question predictions. The method holds promise for clinician-guided, symptom-specific interventions and can be extended to other questionnaires and longitudinal clinical data.
Abstract
Depression is a highly prevalent and disabling condition that incurs substantial personal and societal costs. Current depression diagnosis involves determining the depression severity of a person through self-reported questionnaires or interviews conducted by clinicians. This often leads to delayed treatment and involves substantial human resources. Thus, several works try to automate the process using multimodal data. However, they usually overlook the following: i) The variable contribution of each modality for each question in the questionnaire and ii) Using ordinal classification for the task. This results in sub-optimal fusion and training methods. In this work, we propose a novel Question-wise Modality Fusion (QuestMF) framework trained with a novel Imbalanced Ordinal Log-Loss (ImbOLL) function to tackle these issues. The performance of our framework is comparable to the current state-of-the-art models on the E-DAIC dataset and enhances interpretability by predicting scores for each question. This will help clinicians identify an individual's symptoms, allowing them to customise their interventions accordingly. We also make the code for the QuestMF framework publicly available.
