Table of Contents
Fetching ...

Towards Explainable Multimodal Depression Recognition for Clinical Interviews

Wenjie Zheng, Qiming Xie, Zengzhi Wang, Jianfei Yu, Rui Xia

TL;DR

This work tackles the interpretability gap in multimodal depression recognition during clinical interviews by introducing the Explainable Multimodal Depression Recognition for Clinical Interviews (EMDRC) task. EMDRC first generates a structured symptom summary rooted in the eight PHQ-8 items and then predicts depression severity, enabling symptom-level explanations aligned with clinical practice. To support this, the authors construct the DAIC-Explain dataset with expert-annotated symptom summaries, and propose two complementary approaches: a training-based PHQ-aware multimodal multitask framework (PhqMML) and a training-free PHQ-guided prompting method (PhqCoT). Empirical results show that PhqMML improves both symptom-sum­ma­ry quality and severity prediction over baselines, and PhqCoT achieves strong zero-shot performance, highlighting the value of interpretable intermediate reasoning. The work also discusses limitations (data size/diversity, annotation bias) and outlines future directions toward broader disorders, multilingual contexts, and more trustworthy clinical deployment.

Abstract

Recently, multimodal depression recognition for clinical interviews (MDRC) has recently attracted considerable attention. Existing MDRC studies mainly focus on improving task performance and have achieved significant development. However, for clinical applications, model transparency is critical, and previous works ignore the interpretability of decision-making processes. To address this issue, we propose an Explainable Multimodal Depression Recognition for Clinical Interviews (EMDRC) task, which aims to provide evidence for depression recognition by summarizing symptoms and uncovering underlying causes. Given an interviewer-participant interaction scenario, the goal of EMDRC is to structured summarize participant's symptoms based on the eight-item Patient Health Questionnaire depression scale (PHQ-8), and predict their depression severity. To tackle the EMDRC task, we construct a new dataset based on an existing MDRC dataset. Moreover, we utilize the PHQ-8 and propose a PHQ-aware multimodal multi-task learning framework, which captures the utterance-level symptom-related semantic information to help generate dialogue-level summary. Experiment results on our annotated dataset demonstrate the superiority of our proposed methods over baseline systems on the EMDRC task.

Towards Explainable Multimodal Depression Recognition for Clinical Interviews

TL;DR

This work tackles the interpretability gap in multimodal depression recognition during clinical interviews by introducing the Explainable Multimodal Depression Recognition for Clinical Interviews (EMDRC) task. EMDRC first generates a structured symptom summary rooted in the eight PHQ-8 items and then predicts depression severity, enabling symptom-level explanations aligned with clinical practice. To support this, the authors construct the DAIC-Explain dataset with expert-annotated symptom summaries, and propose two complementary approaches: a training-based PHQ-aware multimodal multitask framework (PhqMML) and a training-free PHQ-guided prompting method (PhqCoT). Empirical results show that PhqMML improves both symptom-sum­ma­ry quality and severity prediction over baselines, and PhqCoT achieves strong zero-shot performance, highlighting the value of interpretable intermediate reasoning. The work also discusses limitations (data size/diversity, annotation bias) and outlines future directions toward broader disorders, multilingual contexts, and more trustworthy clinical deployment.

Abstract

Recently, multimodal depression recognition for clinical interviews (MDRC) has recently attracted considerable attention. Existing MDRC studies mainly focus on improving task performance and have achieved significant development. However, for clinical applications, model transparency is critical, and previous works ignore the interpretability of decision-making processes. To address this issue, we propose an Explainable Multimodal Depression Recognition for Clinical Interviews (EMDRC) task, which aims to provide evidence for depression recognition by summarizing symptoms and uncovering underlying causes. Given an interviewer-participant interaction scenario, the goal of EMDRC is to structured summarize participant's symptoms based on the eight-item Patient Health Questionnaire depression scale (PHQ-8), and predict their depression severity. To tackle the EMDRC task, we construct a new dataset based on an existing MDRC dataset. Moreover, we utilize the PHQ-8 and propose a PHQ-aware multimodal multi-task learning framework, which captures the utterance-level symptom-related semantic information to help generate dialogue-level summary. Experiment results on our annotated dataset demonstrate the superiority of our proposed methods over baseline systems on the EMDRC task.

Paper Structure

This paper contains 27 sections, 10 equations, 8 figures, 4 tables.

Figures (8)

  • Figure 1: The overview of our Explainable MDRC (EMDRC) task, which utilizes the PHQ-8 as a central pivot. Given a clinical interview scenario between an interviewer and a participant, EMDRC first generates a structured summary of the participant's symptoms, derived from eight PHQ items, and then predicts the participant's depression severity.
  • Figure 2: A participant's structured symptom summary annotation process. For each symptom in the participant’s self-reported PHQ-8 results, annotators are required to locate the corresponding segments in the conversation and assess the possible underlying causes. Moreover, any new symptoms reflected in the conversation will be added to the symptom summary annotation (e.g., Concentration Problem in the example above).
  • Figure 3: Symptom summary annotation analysis. (a) Visualizing top 12 most frequent causes in the symptom summary, which are categorized into five domains price2002linkstennant2002life: Life Events and Environmental Adaptation, Economic and Career Stress, Health Issues, Personal Psychological and Emotional Regulation, and Others. (b) Distribution of symptom summary lengths in number of words.
  • Figure 4: The overview of our proposed PHQ-aware multimodal multi-task learning framework (PhqMML).
  • Figure 5: Ablation studies of PhqMML on different modalities and auxiliary IC module.
  • ...and 3 more figures