Table of Contents
Fetching ...

Enhancing Depression Diagnosis with Chain-of-Thought Prompting

Elysia Shi, Adithri Manda, London Chowdhury, Runeema Arun, Kevin Zhu, Michael Lam

TL;DR

This work investigates using chain-of-thought prompting to improve AI-derived PHQ-8 depression scores. By comparing a control prompt to a zero-shot CoT prompt on the DAIC-WOZ dataset with GPT-3.5-turbo, the authors show that CoT reasoning yields scores closer to true values, though statistical significance remains to be robustly established. The study highlights improved interpretability and potential clinical value for AI-assisted depression screening, while acknowledging limitations related to dataset diversity, model scope, and ethical considerations. The findings suggest a path toward more transparent and accessible mental health diagnostics, contingent on broader validation and responsible deployment guidelines.

Abstract

When using AI to detect signs of depressive disorder, AI models habitually draw preemptive conclusions. We theorize that using chain-of-thought (CoT) prompting to evaluate Patient Health Questionnaire-8 (PHQ-8) scores will improve the accuracy of the scores determined by AI models. In our findings, when the models reasoned with CoT, the estimated PHQ-8 scores were consistently closer on average to the accepted true scores reported by each participant compared to when not using CoT. Our goal is to expand upon AI models' understanding of the intricacies of human conversation, allowing them to more effectively assess a patient's feelings and tone, therefore being able to more accurately discern mental disorder symptoms; ultimately, we hope to augment AI models' abilities, so that they can be widely accessible and used in the medical field.

Enhancing Depression Diagnosis with Chain-of-Thought Prompting

TL;DR

This work investigates using chain-of-thought prompting to improve AI-derived PHQ-8 depression scores. By comparing a control prompt to a zero-shot CoT prompt on the DAIC-WOZ dataset with GPT-3.5-turbo, the authors show that CoT reasoning yields scores closer to true values, though statistical significance remains to be robustly established. The study highlights improved interpretability and potential clinical value for AI-assisted depression screening, while acknowledging limitations related to dataset diversity, model scope, and ethical considerations. The findings suggest a path toward more transparent and accessible mental health diagnostics, contingent on broader validation and responsible deployment guidelines.

Abstract

When using AI to detect signs of depressive disorder, AI models habitually draw preemptive conclusions. We theorize that using chain-of-thought (CoT) prompting to evaluate Patient Health Questionnaire-8 (PHQ-8) scores will improve the accuracy of the scores determined by AI models. In our findings, when the models reasoned with CoT, the estimated PHQ-8 scores were consistently closer on average to the accepted true scores reported by each participant compared to when not using CoT. Our goal is to expand upon AI models' understanding of the intricacies of human conversation, allowing them to more effectively assess a patient's feelings and tone, therefore being able to more accurately discern mental disorder symptoms; ultimately, we hope to augment AI models' abilities, so that they can be widely accessible and used in the medical field.
Paper Structure (11 sections, 2 figures, 1 table)

This paper contains 11 sections, 2 figures, 1 table.

Figures (2)

  • Figure 1: Illustration of zero-shot chain of thought prompting kojima2022large
  • Figure 2: Average point difference of Assigner A scores and the true scores compared to the average point difference of Assigner B scores and the true scores