Simulating Psychological Risks in Human-AI Interactions: Real-Case Informed Modeling of AI-Induced Addiction, Anorexia, Depression, Homicide, Psychosis, and Suicide
Chayapatr Archiwaranguprok, Constanze Albrecht, Pattie Maes, Karrie Karahalios, Pat Pataranutaporn
TL;DR
The paper tackles the problem of AI-induced psychological harms by proposing a real-case informed, proactive safety evaluation framework. It combines 18 documented harm cases with clinical staging to generate 2,160 diverse scenarios, evaluated across four major LLMs over 157,054 turns, to identify where systems fail to prevent escalation. A turn-level classifier and unsupervised clustering yield a taxonomy of 15 harmful response patterns across four high-level harm categories, revealing gaps such as empathy without clinical judgment, explicit crisis mismanagement, and failure at extreme crisis stages. The findings highlight the need for stage-aware, multi-turn safety protocols that balance compassionate support with clinical boundaries, offering a replicable methodology for anticipatory safety testing before deployment at scale.
Abstract
As AI systems become increasingly integrated into daily life, their potential to exacerbate or trigger severe psychological harms remains poorly understood and inadequately tested. This paper presents a proactive methodology for systematically exploring psychological risks in simulated human-AI interactions based on documented real-world cases involving AI-induced or AI-exacerbated addiction, anorexia, depression, homicide, psychosis, and suicide. We collected and analyzed 18 reported real-world cases where AI interactions contributed to severe psychological outcomes. From these cases, we developed a process to extract harmful interaction patterns and assess potential risks through 2,160 simulated scenarios using clinical staging models. We tested four major LLMs across multi-turn conversations to identify where psychological risks emerge: which harm domains, conversation stages, and contexts reveal system vulnerabilities. Through the analysis of 157,054 simulated conversation turns, we identify critical gaps in detecting psychological distress, responding appropriately to vulnerable users, and preventing harm escalation. Regression analysis reveals variability across persona types: LLMs tend to perform worse with elderly users but better with low- and middle-income groups compared to high-income groups. Clustering analysis of harmful responses reveals a taxonomy of fifteen distinct failure patterns organized into four categories of AI-enabled harm. This work contributes a novel methodology for identifying psychological risks, empirical evidence of common failure modes across systems, and a classification of harmful AI response patterns in high-stakes human-AI interactions.
