Table of Contents
Fetching ...

Spontaneous High-Order Generalization in Neural Theory-of-Mind Networks

Yiming Wang, Rui Wang

TL;DR

The study investigates whether neural networks can spontaneously generalize from first-order Theory-of-Mind (ToM) to higher-order ToM without acquiring advanced cognitive skills. Using a transformer-based ToMNN trained on first-order ToM within a Sally–Anne framework, the authors show robust generalization to second- and third-order ToM across varied scene complexities and model scales, with higher-order performance substantially above random baselines. Generalization patterns—notably the sharp drop from first- to second-order and the relatively smaller decline to third-order—mirror human cognitive development, suggesting shared qualitative transitions in ToM reasoning. The work provides a scalable, graph-based data-generation pipeline and demonstrates that even modestly sized models can exhibit meaningful high-order ToM generalization, offering a foundation for building more human-like cognitive systems and informing cross-modal extensions.

Abstract

Theory-of-Mind (ToM) is a core human cognitive capacity for attributing mental states to self and others. Wimmer and Perner demonstrated that humans progress from first- to higher-order ToM within a short span, completing this development before formal education or advanced skill acquisition. In contrast, neural networks represented by autoregressive language models progress from first- to higher-order ToM only alongside gains in advanced skills like reasoning, leaving open whether their trajectory can unfold independently, as in humans. In this research, we provided evidence that neural networks could spontaneously generalize from first- to higher-order ToM without relying on advanced skills. We introduced a neural Theory-of-Mind network (ToMNN) that simulated a minimal cognitive system, acquiring only first-order ToM competence. Evaluations of its second- and third-order ToM abilities showed accuracies well above chance. Also, ToMNN exhibited a sharper decline when generalizing from first- to second-order ToM than from second- to higher orders, and its accuracy decreased with greater task complexity. These perceived difficulty patterns were aligned with human cognitive expectations. Furthermore, the universality of results was confirmed across different parameter scales. Our findings illuminate machine ToM generalization patterns and offer a foundation for developing more human-like cognitive systems.

Spontaneous High-Order Generalization in Neural Theory-of-Mind Networks

TL;DR

The study investigates whether neural networks can spontaneously generalize from first-order Theory-of-Mind (ToM) to higher-order ToM without acquiring advanced cognitive skills. Using a transformer-based ToMNN trained on first-order ToM within a Sally–Anne framework, the authors show robust generalization to second- and third-order ToM across varied scene complexities and model scales, with higher-order performance substantially above random baselines. Generalization patterns—notably the sharp drop from first- to second-order and the relatively smaller decline to third-order—mirror human cognitive development, suggesting shared qualitative transitions in ToM reasoning. The work provides a scalable, graph-based data-generation pipeline and demonstrates that even modestly sized models can exhibit meaningful high-order ToM generalization, offering a foundation for building more human-like cognitive systems and informing cross-modal extensions.

Abstract

Theory-of-Mind (ToM) is a core human cognitive capacity for attributing mental states to self and others. Wimmer and Perner demonstrated that humans progress from first- to higher-order ToM within a short span, completing this development before formal education or advanced skill acquisition. In contrast, neural networks represented by autoregressive language models progress from first- to higher-order ToM only alongside gains in advanced skills like reasoning, leaving open whether their trajectory can unfold independently, as in humans. In this research, we provided evidence that neural networks could spontaneously generalize from first- to higher-order ToM without relying on advanced skills. We introduced a neural Theory-of-Mind network (ToMNN) that simulated a minimal cognitive system, acquiring only first-order ToM competence. Evaluations of its second- and third-order ToM abilities showed accuracies well above chance. Also, ToMNN exhibited a sharper decline when generalizing from first- to second-order ToM than from second- to higher orders, and its accuracy decreased with greater task complexity. These perceived difficulty patterns were aligned with human cognitive expectations. Furthermore, the universality of results was confirmed across different parameter scales. Our findings illuminate machine ToM generalization patterns and offer a foundation for developing more human-like cognitive systems.

Paper Structure

This paper contains 24 sections, 15 equations, 14 figures, 4 tables, 3 algorithms.

Figures (14)

  • Figure 1: Implementation pipeline of ToMNN for learning and generalization via the Sally–Anne task.(A) A complete Sally-Anne task consists of a scene and a query. During the learning phase, the model concurrently receives a background description (scene) and a first-order query (e.g., Oliver’s true belief). Its output is compared against the ground truth, and maximum likelihood estimation is performed for training. During the generalization phase, the scene remains unchanged, but the query order is increased: the model must answer a second-order query (e.g., Oliver’s belief about Jack’s belief) and third-order query (e.g., Jack’s belief about Oliver’s belief about Bob’s belief). (B) follows the same procedure as Episode (A) but with increased scene complexity. In our experiments, task complexity is systematically controlled by categorizing scenes into distinct complexity levels and constructing parallel experimental groups to ensure robust generalization evaluation.
  • Figure 2: Sally-Anne task complexity control and experimental group construction across different task complexities.(A) The scene of the Sally-Anne task comprises multiple static and dynamic elements, which determine the task complexity. The textual scene can be losslessly abstracted as a graph structure, and the graph can be reversely reconstructed into its textual form. Graph-scale variables (e.g., number of nodes and edges) can be mapped to the scene variables, enabling precise complexity control. (B) Generalization is evaluated on parallel experimental groups across different complexity levels. From left to right, three complexity settings are shown; within each group, rows indicate (top to bottom) first-order learning accuracy, and second- and third-order generalization accuracy. Superscripts denote the expected random accuracy for each setting.
  • Figure 3: ToMNN accuracies in learning and generalization settings. Results from 12 parallel evaluation groups with varying configurations determined by the three complexity variables $(n, m, q)$ are presented. Each group contains 120 distinct graph structures abstracted from task scenes, with 112 structures included in the training data and the remaining 8 held out for testing (shaded in gray). In each sub-figure, red points denote ToMNN’s learning accuracy on first-order ToM, while blue and yellow points indicate its generalization accuracy on second- and third-order ToM, respectively. Each graph structure corresponds to a scene template enriched with diverse semantic instantiations, generating a large number of tasks. Accordingly, the accuracy of each point represents the average performance across the tasks associated with that structure. The colored lines represent mean accuracies for each order, and the black line indicates the random baseline accuracy ($1/q$).
  • Figure 4: ToMNN accuracies in learning and generalization across model parameter sizes. Each subplot shows a specific $(n,m,q)$ configuration, with the $x$-axis denoting parameter size and the $y$-axis accuracy. The red line indicates learning performance on first-order ToM ($k=1$), while the blue and yellow lines show generalization performances to second- and third-order ToM ($k=2,3$).
  • Figure 5: The pipeline of the textual concretization process under the configuration $\{(n,m), q, k \mid \mathcal{L}\}$.(Stage 1) We first screen all valid belief graph structures containing $n$ nodes and $m$ directed edges. (Stage 2) For each graph structure (need to double-check its validity), we derive the entrance-exit order of characters, i.e., the character-character interaction order. Based on this, we construct a scene template in natural language from $\mathcal{L}_0$, embedding static element placeholders—$n$characters, one object, $q$containers, and $n$character-object-container interactions. (Stage 3) For each scene template, we perform semantic filling of all element placeholders from $\mathcal{L_C}, \mathcal{L_B}, \mathcal{L_A}$, to form the final "Scene". By replacing them with different semantics, a single template can be expanded into diverse scenes. (Stage 4) Finally, for each "Scene", we sample a belief flow involving $k$characters to form a "Query", and derive the corresponding ground-truth "Answer" through designed algorithms, thereby constructing a complete data sample $((\texttt{Scene}, \texttt{Query}), \texttt{Answer})$.
  • ...and 9 more figures