Table of Contents
Fetching ...

Traces of Social Competence in Large Language Models

Tom Kouwenhoven, Michiel van der Meer, Max van Duijn

Abstract

The False Belief Test (FBT) has been the main method for assessing Theory of Mind (ToM) and related socio-cognitive competencies. For Large Language Models (LLMs), the reliability and explanatory potential of this test have remained limited due to issues like data contamination, insufficient model details, and inconsistent controls. We address these issues by testing 17 open-weight models on a balanced set of 192 FBT variants (Trott et al. 2023) using Bayesian Logistic regression to identify how model size and post-training affect socio-cognitive competence. We find that scaling model size benefits performance, but not strictly. A cross-over effect reveals that explicating propositional attitudes (X thinks) fundamentally alters response patterns. Instruction tuning partially mitigates this effect, but further reasoning-oriented finetuning amplifies it. In a case study analysing social reasoning ability throughout OLMo 2 training, we show that this cross-over effect emerges during pre-training, suggesting that models acquire stereotypical response patterns tied to mental-state vocabulary that can outweigh other scenario semantics. Finally, vector steering allows us to isolate a think vector as the causal driver of observed FBT behaviour.

Traces of Social Competence in Large Language Models

Abstract

The False Belief Test (FBT) has been the main method for assessing Theory of Mind (ToM) and related socio-cognitive competencies. For Large Language Models (LLMs), the reliability and explanatory potential of this test have remained limited due to issues like data contamination, insufficient model details, and inconsistent controls. We address these issues by testing 17 open-weight models on a balanced set of 192 FBT variants (Trott et al. 2023) using Bayesian Logistic regression to identify how model size and post-training affect socio-cognitive competence. We find that scaling model size benefits performance, but not strictly. A cross-over effect reveals that explicating propositional attitudes (X thinks) fundamentally alters response patterns. Instruction tuning partially mitigates this effect, but further reasoning-oriented finetuning amplifies it. In a case study analysing social reasoning ability throughout OLMo 2 training, we show that this cross-over effect emerges during pre-training, suggesting that models acquire stereotypical response patterns tied to mental-state vocabulary that can outweigh other scenario semantics. Finally, vector steering allows us to isolate a think vector as the causal driver of observed FBT behaviour.
Paper Structure (25 sections, 1 equation, 10 figures, 3 tables)

This paper contains 25 sections, 1 equation, 10 figures, 3 tables.

Figures (10)

  • Figure 1: Example False Belief task from trott2023do. A detailed version is visible in \ref{['app:taskmodeldetail']}.
  • Figure 2: Effect of model size on the probability of being correct given a knowledge condition without any other interaction effects. Scaling positively influences False Belief performance but not True Belief. Dashed coloured lines indicate human performance.
  • Figure 3: The probability of predicting the location correctly for model variant, knowledge state, and cue. Triangles indicate the average human performance.
  • Figure 4: Strict performance during pre-training for different model sizes. Shaded areas indicate $95\%$ CI.
  • Figure 5: The percentage of correct answers in different base or post-training phases for differently sized OLMo 2 models. Coloured dashed lines indicate human performance, and the bars indicate 95% confidence intervals.
  • ...and 5 more figures