Table of Contents
Fetching ...

Explainable Depression Detection in Clinical Interviews with Personalized Retrieval-Augmented Generation

Linhai Zhang, Ziyang Gao, Deyu Zhou, Yulan He

TL;DR

This paper tackles the need for explainable depression detection from clinical interviews by introducing RED, a Retrieval-Augmented Generation framework that grounds predictions in retrieved transcripts. RED personalizes retrieval via inferred user profiles and augments LLM judgments with event-centric social-intelligence knowledge to improve accuracy and interpretability. Empirical results on the DAIC-WoZ dataset show RED outperforms neural and LLM baselines, with notable gains for the depressed class and improvements in explanation quality and calibration. The approach advances practical mental-health assessment by delivering transparent, evidence-grounded predictions suitable for real-world clinical support while emphasizing responsible use and fairness.

Abstract

Depression is a widespread mental health disorder, and clinical interviews are the gold standard for assessment. However, their reliance on scarce professionals highlights the need for automated detection. Current systems mainly employ black-box neural networks, which lack interpretability, which is crucial in mental health contexts. Some attempts to improve interpretability use post-hoc LLM generation but suffer from hallucination. To address these limitations, we propose RED, a Retrieval-augmented generation framework for Explainable depression Detection. RED retrieves evidence from clinical interview transcripts, providing explanations for predictions. Traditional query-based retrieval systems use a one-size-fits-all approach, which may not be optimal for depression detection, as user backgrounds and situations vary. We introduce a personalized query generation module that combines standard queries with user-specific background inferred by LLMs, tailoring retrieval to individual contexts. Additionally, to enhance LLM performance in social intelligence, we augment LLMs by retrieving relevant knowledge from a social intelligence datastore using an event-centric retriever. Experimental results on the real-world benchmark demonstrate RED's effectiveness compared to neural networks and LLM-based baselines.

Explainable Depression Detection in Clinical Interviews with Personalized Retrieval-Augmented Generation

TL;DR

This paper tackles the need for explainable depression detection from clinical interviews by introducing RED, a Retrieval-Augmented Generation framework that grounds predictions in retrieved transcripts. RED personalizes retrieval via inferred user profiles and augments LLM judgments with event-centric social-intelligence knowledge to improve accuracy and interpretability. Empirical results on the DAIC-WoZ dataset show RED outperforms neural and LLM baselines, with notable gains for the depressed class and improvements in explanation quality and calibration. The approach advances practical mental-health assessment by delivering transparent, evidence-grounded predictions suitable for real-world clinical support while emphasizing responsible use and fairness.

Abstract

Depression is a widespread mental health disorder, and clinical interviews are the gold standard for assessment. However, their reliance on scarce professionals highlights the need for automated detection. Current systems mainly employ black-box neural networks, which lack interpretability, which is crucial in mental health contexts. Some attempts to improve interpretability use post-hoc LLM generation but suffer from hallucination. To address these limitations, we propose RED, a Retrieval-augmented generation framework for Explainable depression Detection. RED retrieves evidence from clinical interview transcripts, providing explanations for predictions. Traditional query-based retrieval systems use a one-size-fits-all approach, which may not be optimal for depression detection, as user backgrounds and situations vary. We introduce a personalized query generation module that combines standard queries with user-specific background inferred by LLMs, tailoring retrieval to individual contexts. Additionally, to enhance LLM performance in social intelligence, we augment LLMs by retrieving relevant knowledge from a social intelligence datastore using an event-centric retriever. Experimental results on the real-world benchmark demonstrate RED's effectiveness compared to neural networks and LLM-based baselines.

Paper Structure

This paper contains 34 sections, 7 equations, 9 figures, 5 tables, 1 algorithm.

Figures (9)

  • Figure 1: Comparison between different depression detection methods. Most of the methods focus on improving performance while ignoring the explanation. Some work tries to generate post-hoc explanations with LLMs while suffering from the hallucination. Our work employs a RAG-based framework to retrieve the supporting evidence from dialogue, which serves as the explanations for the predictions.
  • Figure 2: Overview of RED, which consists of (a) The adaptive RAG framework with two important modules, (b) the Personal Query Generation module, and (c) the Social Intelligence Enhancement module.
  • Figure 3: Case study for user #409. Texts containing personal identification information are removed. Texts in green indicate the important information for prediction, and texts in red indicate the actual scores.
  • Figure 4: Prompt Template for Direct Prompting
  • Figure 5: Prompt Template for Naive/Personal Prompt Retrieval
  • ...and 4 more figures