Enhancing Depression Diagnosis with Chain-of-Thought Prompting
Elysia Shi, Adithri Manda, London Chowdhury, Runeema Arun, Kevin Zhu, Michael Lam
TL;DR
This work investigates using chain-of-thought prompting to improve AI-derived PHQ-8 depression scores. By comparing a control prompt to a zero-shot CoT prompt on the DAIC-WOZ dataset with GPT-3.5-turbo, the authors show that CoT reasoning yields scores closer to true values, though statistical significance remains to be robustly established. The study highlights improved interpretability and potential clinical value for AI-assisted depression screening, while acknowledging limitations related to dataset diversity, model scope, and ethical considerations. The findings suggest a path toward more transparent and accessible mental health diagnostics, contingent on broader validation and responsible deployment guidelines.
Abstract
When using AI to detect signs of depressive disorder, AI models habitually draw preemptive conclusions. We theorize that using chain-of-thought (CoT) prompting to evaluate Patient Health Questionnaire-8 (PHQ-8) scores will improve the accuracy of the scores determined by AI models. In our findings, when the models reasoned with CoT, the estimated PHQ-8 scores were consistently closer on average to the accepted true scores reported by each participant compared to when not using CoT. Our goal is to expand upon AI models' understanding of the intricacies of human conversation, allowing them to more effectively assess a patient's feelings and tone, therefore being able to more accurately discern mental disorder symptoms; ultimately, we hope to augment AI models' abilities, so that they can be widely accessible and used in the medical field.
