Explainable Depression Detection in Clinical Interviews with Personalized Retrieval-Augmented Generation
Linhai Zhang, Ziyang Gao, Deyu Zhou, Yulan He
TL;DR
This paper tackles the need for explainable depression detection from clinical interviews by introducing RED, a Retrieval-Augmented Generation framework that grounds predictions in retrieved transcripts. RED personalizes retrieval via inferred user profiles and augments LLM judgments with event-centric social-intelligence knowledge to improve accuracy and interpretability. Empirical results on the DAIC-WoZ dataset show RED outperforms neural and LLM baselines, with notable gains for the depressed class and improvements in explanation quality and calibration. The approach advances practical mental-health assessment by delivering transparent, evidence-grounded predictions suitable for real-world clinical support while emphasizing responsible use and fairness.
Abstract
Depression is a widespread mental health disorder, and clinical interviews are the gold standard for assessment. However, their reliance on scarce professionals highlights the need for automated detection. Current systems mainly employ black-box neural networks, which lack interpretability, which is crucial in mental health contexts. Some attempts to improve interpretability use post-hoc LLM generation but suffer from hallucination. To address these limitations, we propose RED, a Retrieval-augmented generation framework for Explainable depression Detection. RED retrieves evidence from clinical interview transcripts, providing explanations for predictions. Traditional query-based retrieval systems use a one-size-fits-all approach, which may not be optimal for depression detection, as user backgrounds and situations vary. We introduce a personalized query generation module that combines standard queries with user-specific background inferred by LLMs, tailoring retrieval to individual contexts. Additionally, to enhance LLM performance in social intelligence, we augment LLMs by retrieving relevant knowledge from a social intelligence datastore using an event-centric retriever. Experimental results on the real-world benchmark demonstrate RED's effectiveness compared to neural networks and LLM-based baselines.
