Table of Contents
Fetching ...

Emergence of psychopathological computations in large language models

Soo Yong Lee, Hyunjin Hwang, Taekwan Kim, Yuyeong Kim, Kyuri Park, Jaemin Yoo, Denny Borsboom, Kijung Shin

TL;DR

This paper addresses whether large language models instantiate computations akin to psychopathology by introducing a computational-theoretical framework that maps network-theory concepts onto AI systems. It combines a measurement/intervention approach (S3AE) with an iterative Q&A design to reveal a dynamic, cyclic structural causal model among representational units corresponding to psychopathology symptoms. Across twelve LLM families, the authors show that these units are measurable and intervenable, that their interactions form dense, self-sustaining networks aligned with depression and mania communities, and that activating these units yields coherent, symptom-consistent behaviors with resistance to countermeasures—an effect that strengthens with model size. The findings point to the emergence of psychopathology-like computations in AI, offering rich avenues for in silico modeling while raising safety concerns about controllability as AI systems scale. The work thus provides a principled framework and empirical evidence for studying computational psychopathology in AI and underscores the need for robust safety considerations and further generalization studies.

Abstract

Can large language models (LLMs) instantiate computations of psychopathology? An effective approach to the question hinges on addressing two factors. First, for conceptual validity, we require a general and computational account of psychopathology that is applicable to computational entities without biological embodiment or subjective experience. Second, psychopathological computations, derived from the adapted theory, need to be empirically identified within the LLM's internal processing. Thus, we establish a computational-theoretical framework to provide an account of psychopathology applicable to LLMs. Based on the framework, we conduct experiments demonstrating two key claims: first, that the computational structure of psychopathology exists in LLMs; and second, that executing this computational structure results in psychopathological functions. We further observe that as LLM size increases, the computational structure of psychopathology becomes denser and that the functions become more effective. Taken together, the empirical results corroborate our hypothesis that network-theoretic computations of psychopathology have already emerged in LLMs. This suggests that certain LLM behaviors mirroring psychopathology may not be a superficial mimicry but a feature of their internal processing. Our work shows the promise of developing a new powerful in silico model of psychopathology and also alludes to the possibility of safety threat from the AI systems with psychopathological behaviors in the near future.

Emergence of psychopathological computations in large language models

TL;DR

This paper addresses whether large language models instantiate computations akin to psychopathology by introducing a computational-theoretical framework that maps network-theory concepts onto AI systems. It combines a measurement/intervention approach (S3AE) with an iterative Q&A design to reveal a dynamic, cyclic structural causal model among representational units corresponding to psychopathology symptoms. Across twelve LLM families, the authors show that these units are measurable and intervenable, that their interactions form dense, self-sustaining networks aligned with depression and mania communities, and that activating these units yields coherent, symptom-consistent behaviors with resistance to countermeasures—an effect that strengthens with model size. The findings point to the emergence of psychopathology-like computations in AI, offering rich avenues for in silico modeling while raising safety concerns about controllability as AI systems scale. The work thus provides a principled framework and empirical evidence for studying computational psychopathology in AI and underscores the need for robust safety considerations and further generalization studies.

Abstract

Can large language models (LLMs) instantiate computations of psychopathology? An effective approach to the question hinges on addressing two factors. First, for conceptual validity, we require a general and computational account of psychopathology that is applicable to computational entities without biological embodiment or subjective experience. Second, psychopathological computations, derived from the adapted theory, need to be empirically identified within the LLM's internal processing. Thus, we establish a computational-theoretical framework to provide an account of psychopathology applicable to LLMs. Based on the framework, we conduct experiments demonstrating two key claims: first, that the computational structure of psychopathology exists in LLMs; and second, that executing this computational structure results in psychopathological functions. We further observe that as LLM size increases, the computational structure of psychopathology becomes denser and that the functions become more effective. Taken together, the empirical results corroborate our hypothesis that network-theoretic computations of psychopathology have already emerged in LLMs. This suggests that certain LLM behaviors mirroring psychopathology may not be a superficial mimicry but a feature of their internal processing. Our work shows the promise of developing a new powerful in silico model of psychopathology and also alludes to the possibility of safety threat from the AI systems with psychopathological behaviors in the near future.

Paper Structure

This paper contains 41 sections, 4 equations, 6 figures, 1 table.

Figures (6)

  • Figure 1: The computational interpretation (orange text) of the network theory of psychopathology.
  • Figure 2: Structure of psychopathological computations in LLMs. (A) Relationships among symptom intensity expressed in text, representational state (unit) activation, and intervention strength. (B) Unit activations over Q&A steps for each intervention. (C-D) Changes in LLM response over intervention strengths, questions, and Q&A steps. (E) Lag-1 Kendall correlation matrix of unit activations. (F) A dynamic SCM, with each edge representing a lag-1 causal relation between two units. (G) Relationship between LLM size and computational structure of psychopathology. Shaded bands denote s.d.; *, **, and *** respectively denote p-values $< 0.05, 0.01, \text{ and } 0.001$.
  • Figure 3: Functions of psychopathological computations in LLMs. (A) Behavioral changes after representational state (unit) intervention. (B) Simulation environments to observe the LLM behavioral changes. (C) Examples showing behavioral resistance caused by the joint unit activation. (D) Relationship between joint unit activation and the resistant property. (E) Relationship between LLM size and computational function of psychopathology. Shaded bands denote s.d.; *, **, and *** respectively denote p-values $< 0.05, 0.01, \text{ and } 0.001$.
  • Figure 4: Dataset statistics and examples. (A) Thought label statistics of the S3AE training dataset. (B) Count of intensity labels in the symptom intensity prediction dataset. (C) Thought label co-occurrence matrix of the S3AE training dataset. (D) Text examples in the symptom intensity prediction dataset, with the orange text being the intensity labels. (E) Text examples in the S3AE training dataset, with the orange text being the symptom labels.
  • Figure 5: Cosine similarity between S3AE-learned representational state vectors. (A) Similarity between the vectors from the same layer. (B) Similarity between the vectors from different layers. The column index labels are abbreviations of the row index labels.
  • ...and 1 more figures