When Researchers Say Mental Model/Theory of Mind of AI, What Are They Really Talking About?
Xiaoyun Yin, Elmira Zahmat Doost, Shiwen Zhou, Garima Arya Yadav, Jamie C. Gorman
TL;DR
This paper argues that claims of AI Theory of Mind reflect behavioral mimicry rather than genuine cognitive states, challenging the validity of current ToM benchmarks that rely on static, third-person tasks. It critiques the transplantation of human cognitive tests to AI and promotes a mutual ToM framework that centers on interaction dynamics and bidirectional adaptation between humans and AI. By synthesizing findings from related work, it shows that AI's isolated ToM capabilities do not reliably enhance team performance, whereas improved mutual understanding and collaboration can positively impact outcomes, albeit sometimes increasing workload. The practical implication is a shift in research and design toward systems that support mutual adaptation and cohesive human–AI collaboration, rather than pursuing AI as a stand-in for human social cognition.
Abstract
When researchers claim AI systems possess ToM or mental models, they are fundamentally discussing behavioral predictions and bias corrections rather than genuine mental states. This position paper argues that the current discourse conflates sophisticated pattern matching with authentic cognition, missing a crucial distinction between simulation and experience. While recent studies show LLMs achieving human-level performance on ToM laboratory tasks, these results are based only on behavioral mimicry. More importantly, the entire testing paradigm may be flawed in applying individual human cognitive tests to AI systems, but assessing human cognition directly in the moment of human-AI interaction. I suggest shifting focus toward mutual ToM frameworks that acknowledge the simultaneous contributions of human cognition and AI algorithms, emphasizing the interaction dynamics, instead of testing AI in isolation.
