Spontaneous High-Order Generalization in Neural Theory-of-Mind Networks
Yiming Wang, Rui Wang
TL;DR
The study investigates whether neural networks can spontaneously generalize from first-order Theory-of-Mind (ToM) to higher-order ToM without acquiring advanced cognitive skills. Using a transformer-based ToMNN trained on first-order ToM within a Sally–Anne framework, the authors show robust generalization to second- and third-order ToM across varied scene complexities and model scales, with higher-order performance substantially above random baselines. Generalization patterns—notably the sharp drop from first- to second-order and the relatively smaller decline to third-order—mirror human cognitive development, suggesting shared qualitative transitions in ToM reasoning. The work provides a scalable, graph-based data-generation pipeline and demonstrates that even modestly sized models can exhibit meaningful high-order ToM generalization, offering a foundation for building more human-like cognitive systems and informing cross-modal extensions.
Abstract
Theory-of-Mind (ToM) is a core human cognitive capacity for attributing mental states to self and others. Wimmer and Perner demonstrated that humans progress from first- to higher-order ToM within a short span, completing this development before formal education or advanced skill acquisition. In contrast, neural networks represented by autoregressive language models progress from first- to higher-order ToM only alongside gains in advanced skills like reasoning, leaving open whether their trajectory can unfold independently, as in humans. In this research, we provided evidence that neural networks could spontaneously generalize from first- to higher-order ToM without relying on advanced skills. We introduced a neural Theory-of-Mind network (ToMNN) that simulated a minimal cognitive system, acquiring only first-order ToM competence. Evaluations of its second- and third-order ToM abilities showed accuracies well above chance. Also, ToMNN exhibited a sharper decline when generalizing from first- to second-order ToM than from second- to higher orders, and its accuracy decreased with greater task complexity. These perceived difficulty patterns were aligned with human cognitive expectations. Furthermore, the universality of results was confirmed across different parameter scales. Our findings illuminate machine ToM generalization patterns and offer a foundation for developing more human-like cognitive systems.
