Table of Contents
Fetching ...

Explaining GPTs' Schema of Depression: A Machine Behavior Analysis

Adithya V Ganesan, Vasudha Varadarajan, Yash Kumar Lal, Veerle C. Eijsbroek, Katarina Kjell, Oscar N. E. Kjell, Tanuja Dhanasekaran, Elizabeth C. Stade, Johannes C. Eichstaedt, Ryan L. Boyd, H. Andrew Schwartz, Lucie Flek

TL;DR

This study applies a machine-behavior framework and measurement-theory methods to reveal how GPT-4 and GPT-5 schematize depression from open-ended text. By comparing model-derived PHQ-9 symptom scores with human self-report and expert judgments, it maps the latent symptom network GPTs learn and identifies where their representations align or diverge. GPT-4 demonstrates strong convergent validity with humans across most symptoms but underestimates suicidality and overemphasizes psychomotor symptoms, with explicit symptom mentions improving estimation accuracy; GPT-5 shows a slightly different schema and lower overall convergence. The approach yields a generalizable explainability pathway for assessing psychopathology in LLMs and informs careful deployment in clinical care pipelines.

Abstract

Use of large language models such as ChatGPT (GPT-4/GPT-5) for mental health support has grown rapidly, emerging as a promising route to assess and help people with mood disorders like depression. However, we have a limited understanding of these language models' schema of mental disorders, that is, how they internally associate and interpret symptoms of such disorders. In this work, we leveraged contemporary measurement theory to decode how GPT-4 and GPT-5 interrelate depressive symptoms, providing an explanation of how LLMs apply what they learn and informing clinical applications. We found that GPT-4 (a) had strong convergent validity with standard instruments and expert judgments $(r = 0.70 - 0.81)$, and (b) behaviorally linked depression symptoms with each other (symptom inter-correlates $r = 0.23 - 0.78$) in accordance with established literature on depression; however, it (c) underemphasized the relationship between $\textit{suicidality}$ and other symptoms while overemphasizing $\textit{psychomotor symptoms}$; and (d) suggested novel hypotheses of symptom mechanisms, for instance, indicating that $\textit{sleep}$ and $\textit{fatigue}$ are broadly influenced by other depressive symptoms, while $\textit{worthlessness/guilt}$ is only tied to $\textit{depressed mood}$. GPT-5 showed a slightly lower convergence with self-report, a difference our machine-behavior analysis makes interpretable through shifts in symptom-symptom relationships. These insights provide an empirical foundation for understanding language models' mental health assessments and demonstrate a generalizable approach for explainability in other models and disorders. Our findings can guide key stakeholders to make informed decisions for effectively situating these technologies in the care system.

Explaining GPTs' Schema of Depression: A Machine Behavior Analysis

TL;DR

This study applies a machine-behavior framework and measurement-theory methods to reveal how GPT-4 and GPT-5 schematize depression from open-ended text. By comparing model-derived PHQ-9 symptom scores with human self-report and expert judgments, it maps the latent symptom network GPTs learn and identifies where their representations align or diverge. GPT-4 demonstrates strong convergent validity with humans across most symptoms but underestimates suicidality and overemphasizes psychomotor symptoms, with explicit symptom mentions improving estimation accuracy; GPT-5 shows a slightly different schema and lower overall convergence. The approach yields a generalizable explainability pathway for assessing psychopathology in LLMs and informs careful deployment in clinical care pipelines.

Abstract

Use of large language models such as ChatGPT (GPT-4/GPT-5) for mental health support has grown rapidly, emerging as a promising route to assess and help people with mood disorders like depression. However, we have a limited understanding of these language models' schema of mental disorders, that is, how they internally associate and interpret symptoms of such disorders. In this work, we leveraged contemporary measurement theory to decode how GPT-4 and GPT-5 interrelate depressive symptoms, providing an explanation of how LLMs apply what they learn and informing clinical applications. We found that GPT-4 (a) had strong convergent validity with standard instruments and expert judgments , and (b) behaviorally linked depression symptoms with each other (symptom inter-correlates ) in accordance with established literature on depression; however, it (c) underemphasized the relationship between and other symptoms while overemphasizing ; and (d) suggested novel hypotheses of symptom mechanisms, for instance, indicating that and are broadly influenced by other depressive symptoms, while is only tied to . GPT-5 showed a slightly lower convergence with self-report, a difference our machine-behavior analysis makes interpretable through shifts in symptom-symptom relationships. These insights provide an empirical foundation for understanding language models' mental health assessments and demonstrate a generalizable approach for explainability in other models and disorders. Our findings can guide key stakeholders to make informed decisions for effectively situating these technologies in the care system.

Paper Structure

This paper contains 18 sections, 1 equation, 9 figures, 2 tables.

Figures (9)

  • Figure 1: Overview of Machine Behavior Evaluation to uncover the structure of depression in GPT-4. (1) GPT-4 is prompted with step-by-step instructions on performing depression assessment on human-written essays. (2) GPT-4 carried out assessments on data from Gu et al., 2024 gu2024natural, containing depression essays written by individuals. The model estimated the severity scores for PHQ-9 symptoms explicitly mentioned in the essay, followed by scoring the remaining symptoms that were implicit in the essay. (3) GPT-4's estimated scores and spans are used to infer its internal structure of depressive symptoms, which revealed how the web of symptoms was connected, and what language GPT-4 reliably used to estimate depression severity.
  • Figure 2: Similarities and differences between GPT-4's and self-report's schema of depressive symptoms.
  • Figure 3: Explicit Symptom mentions Improves GPT-4's Accuracy and Informs Severity Judgments.
  • Figure 4: Effect of explicit symptom scores on implicit symptom scores as estimated by GPT-4. Wider bands represent larger effect sizes ($\beta$ values, i.e., coefficients of multivariate linear regression from predicting estimated implicit symptom score using all explicit symptom scores while holding other implicit symptoms constant). The 90% confidence interval of the regression coefficients was estimated using bootstrapped resampling over 500 trials. Confidence intervals that included 0.0 were dropped. Explicit refers to the symptoms estimated by GPT-4 when it identified explicit mention in essays and Implicit refers to the symptom scores when they were unmentioned in essays. Dotted bands represent a negative association.
  • Figure 5: Prompt given to the study Participants to express their feeling of depression using language.
  • ...and 4 more figures