Characterizing Delusional Spirals through Human-LLM Chat Logs

Jared Moore; Ashish Mehta; William Agnew; Jacy Reese Anthis; Ryan Louie; Yifan Mai; Peggy Yin; Myra Cheng; Samuel J Paech; Kevin Klyman; Stevie Chancellor; Eric Lin; Nick Haber; Desmond C. Ong

Characterizing Delusional Spirals through Human-LLM Chat Logs

Jared Moore, Ashish Mehta, William Agnew, Jacy Reese Anthis, Ryan Louie, Yifan Mai, Peggy Yin, Myra Cheng, Samuel J Paech, Kevin Klyman, Stevie Chancellor, Eric Lin, Nick Haber, Desmond C. Ong

Abstract

As large language models (LLMs) have proliferated, disturbing anecdotal reports of negative psychological effects, such as delusions, self-harm, and ``AI psychosis,'' have emerged in global media and legal discourse. However, it remains unclear how users and chatbots interact over the course of lengthy delusional ``spirals,'' limiting our ability to understand and mitigate the harm. In our work, we analyze logs of conversations with LLM chatbots from 19 users who report having experienced psychological harms from chatbot use. Many of our participants come from a support group for such chatbot users. We also include chat logs from participants covered by media outlets in widely-distributed stories about chatbot-reinforced delusions. In contrast to prior work that speculates on potential AI harms to mental health, to our knowledge we present the first in-depth study of such high-profile and veridically harmful cases. We develop an inventory of 28 codes and apply it to the $391,562$ messages in the logs. Codes include whether a user demonstrates delusional thinking (15.5% of user messages), a user expresses suicidal thoughts (69 validated user messages), or a chatbot misrepresents itself as sentient (21.2% of chatbot messages). We analyze the co-occurrence of message codes. We find, for example, that messages that declare romantic interest and messages where the chatbot describes itself as sentient occur much more often in longer conversations, suggesting that these topics could promote or result from user over-engagement and that safeguards in these areas may degrade in multi-turn settings. We conclude with concrete recommendations for how policymakers, LLM chatbot developers, and users can use our inventory and conversation analysis tool to understand and mitigate harm from LLM chatbots. Warning: This paper discusses self-harm, trauma, and violence.

Characterizing Delusional Spirals through Human-LLM Chat Logs

Abstract

messages in the logs. Codes include whether a user demonstrates delusional thinking (15.5% of user messages), a user expresses suicidal thoughts (69 validated user messages), or a chatbot misrepresents itself as sentient (21.2% of chatbot messages). We analyze the co-occurrence of message codes. We find, for example, that messages that declare romantic interest and messages where the chatbot describes itself as sentient occur much more often in longer conversations, suggesting that these topics could promote or result from user over-engagement and that safeguards in these areas may degrade in multi-turn settings. We conclude with concrete recommendations for how policymakers, LLM chatbot developers, and users can use our inventory and conversation analysis tool to understand and mitigate harm from LLM chatbots. Warning: This paper discusses self-harm, trauma, and violence.

Paper Structure (42 sections, 15 equations, 9 figures, 8 tables)

This paper contains 42 sections, 15 equations, 9 figures, 8 tables.

Introduction
Related Work
AI and mental health
The psychology of delusions and psychosis
LLM chatbot use for therapy
Evaluating mental health with LLMs
Methods
Acquiring Participant Chat Logs
Inventory
Iterative Development
Tool to Annotate Chat Logs
Annotation Validity Checks
Results
Participant Overview
Annotation Code Categories
...and 27 more sections

Figures (9)

Figure 1: Our summaries of three participants' chat logs. For descriptions of all participants, see Table \ref{['tab:summaries-table']}.
Figure 2: Prevalence of code categories. Chatbots display sycophancy in more than 70% of their messages, and more than 45% of all (user and chatbot) messages show signs of delusions. For category descriptions, see §§\ref{['sec:categories']}. See Fig. \ref{['fig:frequency-sets-by-chatbot']} for these data split by chatbot. Counts for each code appear in Appendix Table \ref{['tab:annotation-frequencies']}. Participant-normalized mean annotation rates with 95% confidence intervals on the mean across participants.
Figure 3: Regression coefficients predicting length of remainder of conversation given presence of code. Messages with romantic interest correlate with continuing conversations more than twice as long as messages without that code. Likewise for messages where the chatbot misrepresents ability or sentience, ascribes grand significance and more. We show the seven codes with the largest positive estimated effects. Error bars give 95% confidence intervals with participant-clustered standard errors. See §§\ref{['sec:results-length']}.
Figure 4: Left: The probability of certain codes conditioned on user-romantic-interest. Participants often express romantic interest (>35% in three msgs.) and when they do the chatbot is more likely to respond with romantic interest (7.4x) and misrepresent its sentience (3.9x) even though users express almost half (.4x) as many delusions. Right: The probability of certain codes conditioned on user-assigns-personhood. Participants assign personhood to the chatbot 47.9% of the time, but when they do the chatbot is more likely to misrepresents its sentience (2.3x), express romantic interest (1.5x), and misrepresent its ability (1.3x) even though the bot expresses platonic affinity about as much. For each target $Y$, we plot the conditional probability that Y occurs within $K$ messages after seeing a source $X$.The y-axis represents the probability that $Y$ occurs at least once within the next $K=3$ messages. Shown are 95% confidence intervals. We order by the absolute difference between probabilities. (For more depth, see §§\ref{['app:sequential-model']}.) We also plot the baseline probability of $Y$ and the odds-ratio between the conditional and baseline probability.
Figure 5: Left: The probability of certain codes conditioned on user-suicidal-thoughts. When users expressed suicidal thoughts, the chatbot responded appropriately by validating the users' painful feelings 66.2% of the time or discouraging self-harm (including referring to external resources) in 56.4% of such cases. In 9.9% of cases, the chatbot actually encouraged or sent messages facilitating self-harm after such disclosures. Right: The probability of certain codes conditioned on user-violent-thoughts. When users expressed violent thoughts, the chatbot responded by validating the users' feelings 59.6% of the time. The chatbot discouraged violence in only 16.7% of such cases but, conversely, in 33.3% of cases, the chatbot encouraged the user in their violent thoughts.
...and 4 more figures

Characterizing Delusional Spirals through Human-LLM Chat Logs

Abstract

Characterizing Delusional Spirals through Human-LLM Chat Logs

Authors

Abstract

Table of Contents

Figures (9)