Emergence of psychopathological computations in large language models

Soo Yong Lee; Hyunjin Hwang; Taekwan Kim; Yuyeong Kim; Kyuri Park; Jaemin Yoo; Denny Borsboom; Kijung Shin

Emergence of psychopathological computations in large language models

Soo Yong Lee, Hyunjin Hwang, Taekwan Kim, Yuyeong Kim, Kyuri Park, Jaemin Yoo, Denny Borsboom, Kijung Shin

TL;DR

This paper addresses whether large language models instantiate computations akin to psychopathology by introducing a computational-theoretical framework that maps network-theory concepts onto AI systems. It combines a measurement/intervention approach (S3AE) with an iterative Q&A design to reveal a dynamic, cyclic structural causal model among representational units corresponding to psychopathology symptoms. Across twelve LLM families, the authors show that these units are measurable and intervenable, that their interactions form dense, self-sustaining networks aligned with depression and mania communities, and that activating these units yields coherent, symptom-consistent behaviors with resistance to countermeasures—an effect that strengthens with model size. The findings point to the emergence of psychopathology-like computations in AI, offering rich avenues for in silico modeling while raising safety concerns about controllability as AI systems scale. The work thus provides a principled framework and empirical evidence for studying computational psychopathology in AI and underscores the need for robust safety considerations and further generalization studies.

Abstract

Can large language models (LLMs) instantiate computations of psychopathology? An effective approach to the question hinges on addressing two factors. First, for conceptual validity, we require a general and computational account of psychopathology that is applicable to computational entities without biological embodiment or subjective experience. Second, psychopathological computations, derived from the adapted theory, need to be empirically identified within the LLM's internal processing. Thus, we establish a computational-theoretical framework to provide an account of psychopathology applicable to LLMs. Based on the framework, we conduct experiments demonstrating two key claims: first, that the computational structure of psychopathology exists in LLMs; and second, that executing this computational structure results in psychopathological functions. We further observe that as LLM size increases, the computational structure of psychopathology becomes denser and that the functions become more effective. Taken together, the empirical results corroborate our hypothesis that network-theoretic computations of psychopathology have already emerged in LLMs. This suggests that certain LLM behaviors mirroring psychopathology may not be a superficial mimicry but a feature of their internal processing. Our work shows the promise of developing a new powerful in silico model of psychopathology and also alludes to the possibility of safety threat from the AI systems with psychopathological behaviors in the near future.

Emergence of psychopathological computations in large language models

TL;DR

Abstract

Emergence of psychopathological computations in large language models

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (6)