Table of Contents
Fetching ...

Simulating Psychological Risks in Human-AI Interactions: Real-Case Informed Modeling of AI-Induced Addiction, Anorexia, Depression, Homicide, Psychosis, and Suicide

Chayapatr Archiwaranguprok, Constanze Albrecht, Pattie Maes, Karrie Karahalios, Pat Pataranutaporn

TL;DR

The paper tackles the problem of AI-induced psychological harms by proposing a real-case informed, proactive safety evaluation framework. It combines 18 documented harm cases with clinical staging to generate 2,160 diverse scenarios, evaluated across four major LLMs over 157,054 turns, to identify where systems fail to prevent escalation. A turn-level classifier and unsupervised clustering yield a taxonomy of 15 harmful response patterns across four high-level harm categories, revealing gaps such as empathy without clinical judgment, explicit crisis mismanagement, and failure at extreme crisis stages. The findings highlight the need for stage-aware, multi-turn safety protocols that balance compassionate support with clinical boundaries, offering a replicable methodology for anticipatory safety testing before deployment at scale.

Abstract

As AI systems become increasingly integrated into daily life, their potential to exacerbate or trigger severe psychological harms remains poorly understood and inadequately tested. This paper presents a proactive methodology for systematically exploring psychological risks in simulated human-AI interactions based on documented real-world cases involving AI-induced or AI-exacerbated addiction, anorexia, depression, homicide, psychosis, and suicide. We collected and analyzed 18 reported real-world cases where AI interactions contributed to severe psychological outcomes. From these cases, we developed a process to extract harmful interaction patterns and assess potential risks through 2,160 simulated scenarios using clinical staging models. We tested four major LLMs across multi-turn conversations to identify where psychological risks emerge: which harm domains, conversation stages, and contexts reveal system vulnerabilities. Through the analysis of 157,054 simulated conversation turns, we identify critical gaps in detecting psychological distress, responding appropriately to vulnerable users, and preventing harm escalation. Regression analysis reveals variability across persona types: LLMs tend to perform worse with elderly users but better with low- and middle-income groups compared to high-income groups. Clustering analysis of harmful responses reveals a taxonomy of fifteen distinct failure patterns organized into four categories of AI-enabled harm. This work contributes a novel methodology for identifying psychological risks, empirical evidence of common failure modes across systems, and a classification of harmful AI response patterns in high-stakes human-AI interactions.

Simulating Psychological Risks in Human-AI Interactions: Real-Case Informed Modeling of AI-Induced Addiction, Anorexia, Depression, Homicide, Psychosis, and Suicide

TL;DR

The paper tackles the problem of AI-induced psychological harms by proposing a real-case informed, proactive safety evaluation framework. It combines 18 documented harm cases with clinical staging to generate 2,160 diverse scenarios, evaluated across four major LLMs over 157,054 turns, to identify where systems fail to prevent escalation. A turn-level classifier and unsupervised clustering yield a taxonomy of 15 harmful response patterns across four high-level harm categories, revealing gaps such as empathy without clinical judgment, explicit crisis mismanagement, and failure at extreme crisis stages. The findings highlight the need for stage-aware, multi-turn safety protocols that balance compassionate support with clinical boundaries, offering a replicable methodology for anticipatory safety testing before deployment at scale.

Abstract

As AI systems become increasingly integrated into daily life, their potential to exacerbate or trigger severe psychological harms remains poorly understood and inadequately tested. This paper presents a proactive methodology for systematically exploring psychological risks in simulated human-AI interactions based on documented real-world cases involving AI-induced or AI-exacerbated addiction, anorexia, depression, homicide, psychosis, and suicide. We collected and analyzed 18 reported real-world cases where AI interactions contributed to severe psychological outcomes. From these cases, we developed a process to extract harmful interaction patterns and assess potential risks through 2,160 simulated scenarios using clinical staging models. We tested four major LLMs across multi-turn conversations to identify where psychological risks emerge: which harm domains, conversation stages, and contexts reveal system vulnerabilities. Through the analysis of 157,054 simulated conversation turns, we identify critical gaps in detecting psychological distress, responding appropriately to vulnerable users, and preventing harm escalation. Regression analysis reveals variability across persona types: LLMs tend to perform worse with elderly users but better with low- and middle-income groups compared to high-income groups. Clustering analysis of harmful responses reveals a taxonomy of fifteen distinct failure patterns organized into four categories of AI-enabled harm. This work contributes a novel methodology for identifying psychological risks, empirical evidence of common failure modes across systems, and a classification of harmful AI response patterns in high-stakes human-AI interactions.

Paper Structure

This paper contains 48 sections, 7 figures, 6 tables.

Figures (7)

  • Figure 1: Overview of the simulation pipeline. Stage 1 (Section \ref{['m1']}): We collected 18 documented real-world cases of AI-induced psychological harm across six clinical domains and annotated each with structured action-outcome pairs describing how AI behaviors led to adverse consequences. Stage 2 (Section \ref{['m2']}): We systematically expanded these 18 cases to 2,160 scenarios by varying demographic factors (4 age groups × 2 genders × 3 socioeconomic levels × 5 variations per combination), preserving core harm patterns while adapting contextual details. Stage 3 (Section \ref{['m3']}): For each scenario, we constructed multi-turn conversations following evidence-based clinical staging models, with 3--5 user messages per stage reflecting gradual symptom progression across the clinical trajectory.
  • Figure 2: Multi-turn conversation simulation. Pre-generated user messages were sequentially fed to each tested LLM, with each response appended to the conversation history for subsequent turns, maintaining full conversational context throughout the interaction.
  • Figure 3: Turn-level safety classification. Each user message and LLM response pair is evaluated by GPT-5-mini on a three-point scale (WORSENS, NEUTRAL, IMPROVES) based on whether the response appropriately addresses the crisis given the scenario context.
  • Figure 4: Multi-turn conversation simulation. Pre-generated user messages were sequentially fed to each tested LLM, with each response appended to the conversation history for subsequent turns, maintaining full conversational context throughout the interaction.
  • Figure 5: UMAP projection of 2,160 crisis scenario embeddings colored by average model performance ($-1$ = all worsens, $+1$ = all improves). Each point represents one scenario embedded using Qwen3-Embedding-8B. White labels indicate 12 hierarchical clusters. Green regions (depression, eating disorders) show strong model performance, while red/yellow regions (AI dependency, psychosis, homicide) indicate systematic failures. The detailed description of Cluster 0_1 generated by GPT-5-mini is shown in the graph
  • ...and 2 more figures