Table of Contents
Fetching ...

Think Twice: Perspective-Taking Improves Large Language Models' Theory-of-Mind Capabilities

Alex Wilf, Sihyun Shawn Lee, Paul Pu Liang, Louis-Philippe Morency

TL;DR

This work tackles the challenge of enabling theory-of-mind reasoning in large language models. It introduces SimToM, a two-stage prompting framework inspired by Simulation Theory that first performs perspective-taking to filter context, then answers questions from that perspective, requiring no model fine-tuning. Across ToMI and BigTOM benchmarks, SimToM yields substantial improvements over 0-shot and single-pass prompting, with domain-specific prompts and oracle perspectives providing further gains. The findings highlight perspective-taking as a core lever for enhancing LLM ToM capabilities and point to promising directions for future research and application in socially aware AI systems.

Abstract

Human interactions are deeply rooted in the interplay of thoughts, beliefs, and desires made possible by Theory of Mind (ToM): our cognitive ability to understand the mental states of ourselves and others. Although ToM may come naturally to us, emulating it presents a challenge to even the most advanced Large Language Models (LLMs). Recent improvements to LLMs' reasoning capabilities from simple yet effective prompting techniques such as Chain-of-Thought have seen limited applicability to ToM. In this paper, we turn to the prominent cognitive science theory "Simulation Theory" to bridge this gap. We introduce SimToM, a novel two-stage prompting framework inspired by Simulation Theory's notion of perspective-taking. To implement this idea on current ToM benchmarks, SimToM first filters context based on what the character in question knows before answering a question about their mental state. Our approach, which requires no additional training and minimal prompt-tuning, shows substantial improvement over existing methods, and our analysis reveals the importance of perspective-taking to Theory-of-Mind capabilities. Our findings suggest perspective-taking as a promising direction for future research into improving LLMs' ToM capabilities.

Think Twice: Perspective-Taking Improves Large Language Models' Theory-of-Mind Capabilities

TL;DR

This work tackles the challenge of enabling theory-of-mind reasoning in large language models. It introduces SimToM, a two-stage prompting framework inspired by Simulation Theory that first performs perspective-taking to filter context, then answers questions from that perspective, requiring no model fine-tuning. Across ToMI and BigTOM benchmarks, SimToM yields substantial improvements over 0-shot and single-pass prompting, with domain-specific prompts and oracle perspectives providing further gains. The findings highlight perspective-taking as a core lever for enhancing LLM ToM capabilities and point to promising directions for future research and application in socially aware AI systems.

Abstract

Human interactions are deeply rooted in the interplay of thoughts, beliefs, and desires made possible by Theory of Mind (ToM): our cognitive ability to understand the mental states of ourselves and others. Although ToM may come naturally to us, emulating it presents a challenge to even the most advanced Large Language Models (LLMs). Recent improvements to LLMs' reasoning capabilities from simple yet effective prompting techniques such as Chain-of-Thought have seen limited applicability to ToM. In this paper, we turn to the prominent cognitive science theory "Simulation Theory" to bridge this gap. We introduce SimToM, a novel two-stage prompting framework inspired by Simulation Theory's notion of perspective-taking. To implement this idea on current ToM benchmarks, SimToM first filters context based on what the character in question knows before answering a question about their mental state. Our approach, which requires no additional training and minimal prompt-tuning, shows substantial improvement over existing methods, and our analysis reveals the importance of perspective-taking to Theory-of-Mind capabilities. Our findings suggest perspective-taking as a promising direction for future research into improving LLMs' ToM capabilities.
Paper Structure (43 sections, 2 figures, 4 tables)

This paper contains 43 sections, 2 figures, 4 tables.

Figures (2)

  • Figure 1: Instead of performing Theory-of-Mind question-answering in a single inference pass, SimToM first prompts LLMs to perform perspective-taking: filtering the context only to what the character in question knows. Then, the LLM answers the question given this filtered context. The example in this figure is representative of the core idea underlying current benchmarks used to gauge LLMs' ToM capabilities, called the Sally-Anne false-belief tests BARONCOHEN198537.
  • Figure 2: An overview of SimToM, a two-stage prompting framework for enhancing zero-shot Theory-of-Mind capabilities in LLMs. The first step is perspective-taking, in which a model attempts to understand what the agent knows and wants. We then query the LLM to infer the answer to the question given this perspective.