Table of Contents
Fetching ...

Does ChatGPT Have a Mind?

Simon Goldstein, Benjamin A. Levinstein

TL;DR

Does ChatGPT Have a Mind? investigates whether LLMs exhibit folk psychology by separating internal representations from dispositions to act. The authors argue that LLMs demonstrate robust internal representations that satisfy multiple naturalistic theories of mental content, supported by interpretability probes, causal interventions, and world-model considerations, while robust, stable action dispositions remain inconclusive. They counter major skeptical challenges—sensory grounding, stochastic parrots, and memorization—through nuanced analyses, including multimodal extensions and evidence of emergent capabilities. The work concludes that mind-like properties in LLMs are plausible but not settled, with strong internal representations yet open questions about stable goal-directed behavior and moral status implications.

Abstract

This paper examines the question of whether Large Language Models (LLMs) like ChatGPT possess minds, focusing specifically on whether they have a genuine folk psychology encompassing beliefs, desires, and intentions. We approach this question by investigating two key aspects: internal representations and dispositions to act. First, we survey various philosophical theories of representation, including informational, causal, structural, and teleosemantic accounts, arguing that LLMs satisfy key conditions proposed by each. We draw on recent interpretability research in machine learning to support these claims. Second, we explore whether LLMs exhibit robust dispositions to perform actions, a necessary component of folk psychology. We consider two prominent philosophical traditions, interpretationism and representationalism, to assess LLM action dispositions. While we find evidence suggesting LLMs may satisfy some criteria for having a mind, particularly in game-theoretic environments, we conclude that the data remains inconclusive. Additionally, we reply to several skeptical challenges to LLM folk psychology, including issues of sensory grounding, the "stochastic parrots" argument, and concerns about memorization. Our paper has three main upshots. First, LLMs do have robust internal representations. Second, there is an open question to answer about whether LLMs have robust action dispositions. Third, existing skeptical challenges to LLM representation do not survive philosophical scrutiny.

Does ChatGPT Have a Mind?

TL;DR

Does ChatGPT Have a Mind? investigates whether LLMs exhibit folk psychology by separating internal representations from dispositions to act. The authors argue that LLMs demonstrate robust internal representations that satisfy multiple naturalistic theories of mental content, supported by interpretability probes, causal interventions, and world-model considerations, while robust, stable action dispositions remain inconclusive. They counter major skeptical challenges—sensory grounding, stochastic parrots, and memorization—through nuanced analyses, including multimodal extensions and evidence of emergent capabilities. The work concludes that mind-like properties in LLMs are plausible but not settled, with strong internal representations yet open questions about stable goal-directed behavior and moral status implications.

Abstract

This paper examines the question of whether Large Language Models (LLMs) like ChatGPT possess minds, focusing specifically on whether they have a genuine folk psychology encompassing beliefs, desires, and intentions. We approach this question by investigating two key aspects: internal representations and dispositions to act. First, we survey various philosophical theories of representation, including informational, causal, structural, and teleosemantic accounts, arguing that LLMs satisfy key conditions proposed by each. We draw on recent interpretability research in machine learning to support these claims. Second, we explore whether LLMs exhibit robust dispositions to perform actions, a necessary component of folk psychology. We consider two prominent philosophical traditions, interpretationism and representationalism, to assess LLM action dispositions. While we find evidence suggesting LLMs may satisfy some criteria for having a mind, particularly in game-theoretic environments, we conclude that the data remains inconclusive. Additionally, we reply to several skeptical challenges to LLM folk psychology, including issues of sensory grounding, the "stochastic parrots" argument, and concerns about memorization. Our paper has three main upshots. First, LLMs do have robust internal representations. Second, there is an open question to answer about whether LLMs have robust action dispositions. Third, existing skeptical challenges to LLM representation do not survive philosophical scrutiny.
Paper Structure (15 sections, 6 figures)

This paper contains 15 sections, 6 figures.

Figures (6)

  • Figure 1: Simplified two-layer transformer architecture processing "The cat is on the". Each word is initially converted to an embedding vector. In each layer, self-attention (Att) allows words to attend to each other, followed by a multi-layer perceptron (MLP). After the first layer, new contextual embeddings are created. The final layer produces probabilities for the next token.
  • Figure 2: Illustration of the probing process using Othello-GPT. The LLM processes the input sequence of Othello moves, generating activations. A separate probing classifier is trained to predict specific features (e.g., the state of a particular square) from these activations.
  • Figure 3: Activation patching in Othello-GPT. (a) The top panel shows the Othello board state and model predictions before and after intervention. The upper row in each state displays the model's move predictions with associated probabilities, while the lower row shows the actual board state. Pre-intervention, the model correctly predicts legal moves. Post-intervention, the model's predictions change. (b) The bottom panel illustrates the process of activation patching across different layers and timestamps of the model. The intervention at a specific layer and timestamp propagates through subsequent layers, ultimately affecting the final prediction. This demonstration shows how altering internal representations can causally influence the model's outputs, even leading to illegal move predictions in the context of the game state shown above. Both depictions are adapted from li2022emergent.
  • Figure 4: Spatial and temporal representations in Llama-2-70b. Each point corresponds to the layer 50 activations of the last token of a place (top) or event (bottom) projected onto a learned linear probe direction. The clear structure in these projections, closely matching real-world geography and chronology, demonstrates that the model has learned coherent representations of space and time. All points depicted are from the test set. (Adapted from gurnee2024language.)
  • Figure 5: These figures together illustrate the study's approach to testing language models' understanding of color terms. While panel (b) suggests the model's ability to generalize to unseen colors, panel (a) shows how rotated color spaces were used to control for potential memorization of RGB-to-name mappings. This combined approach helps distinguish between true generalization and mere memorization of training data. (Adapted from patel2021mapping.
  • ...and 1 more figures