Decomposing Theory of Mind: How Emotional Processing Mediates ToM Abilities in LLMs

Ivan Chulo; Ananya Joshi

Decomposing Theory of Mind: How Emotional Processing Mediates ToM Abilities in LLMs

Ivan Chulo, Ananya Joshi

TL;DR

This paper investigates the cognitive mechanisms underlying Theory of Mind (ToM) in large language models and how activation steering via Contrastive Activation Addition (CAA) modulates them. It decomposes ToM into 45 cognitive actions, trains linear probes, and applies CAA steering to evaluate changes on 1,000 BigToM forward belief scenarios, reporting a $14.2\%$ accuracy gain from $32.5\%$ to $46.7\%$ and linking the improvement to enhanced emotional processing while suppressing analytical interrogation. The key finding is that emotional understanding and generative hypothesis formation mediate ToM performance, rather than purely analytical reasoning, suggesting ToM in LLMs relies on affective representations. This work provides a mechanistic interpretability framework that combines targeted interventions with probe-based decomposition to analyze high-level cognitive abilities and informs steering design for improved social reasoning in AI.

Abstract

Recent work shows activation steering substantially improves language models' Theory of Mind (ToM) (Bortoletto et al. 2024), yet the mechanisms of what changes occur internally that leads to different outputs remains unclear. We propose decomposing ToM in LLMs by comparing steered versus baseline LLMs' activations using linear probes trained on 45 cognitive actions. We applied Contrastive Activation Addition (CAA) steering to Gemma-3-4B and evaluated it on 1,000 BigToM forward belief scenarios (Gandhi et al. 2023), we find improved performance on belief attribution tasks (32.5\% to 46.7\% accuracy) is mediated by activations processing emotional content : emotion perception (+2.23), emotion valuing (+2.20), while suppressing analytical processes: questioning (-0.78), convergent thinking (-1.59). This suggests that successful ToM abilities in LLMs are mediated by emotional understanding, not analytical reasoning.

Decomposing Theory of Mind: How Emotional Processing Mediates ToM Abilities in LLMs

TL;DR

Abstract

Decomposing Theory of Mind: How Emotional Processing Mediates ToM Abilities in LLMs

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (7)