Table of Contents
Fetching ...

Let's Put Ourselves in Sally's Shoes: Shoes-of-Others Prefilling Improves Theory of Mind in Large Language Models

Kazutoshi Shinoda, Nobukatsu Hojo, Kyosuke Nishida, Yoshihiro Yamazaki, Keita Suzuki, Hiroaki Sugiyama, Kuniko Saito

TL;DR

The paper introduces Shoes-of-Others (SoO) prefilling, an inference-time method that prefixes LLM outputs with a perspective-taking prompt to improve Theory of Mind without fine-tuning. SoO prefilling demonstrates consistent gains across five mental-state categories on two ToM benchmarks (ToMATO and ToMBench), outperforming CoT-based and prompting baselines, and remains effective across multiple model families. Analyses show improvements arise from increased faithfulness of the model's intermediate thoughts rather than mere longer reasoning, and the benefits persist without relying on extended compute. The work connects explicit perspective-taking to enhanced ToM performance and discusses implications for ASD-related research, while acknowledging limitations in scope, potential biases, and ethical considerations.

Abstract

Recent studies have shown that Theory of Mind (ToM) in large language models (LLMs) has not reached human-level performance yet. Since fine-tuning LLMs on ToM datasets often degrades their generalization, several inference-time methods have been proposed to enhance ToM in LLMs. However, existing inference-time methods for ToM are specialized for inferring beliefs from contexts involving changes in the world state. In this study, we present a new inference-time method for ToM, Shoes-of-Others (SoO) prefilling, which makes fewer assumptions about contexts and is applicable to broader scenarios. SoO prefilling simply specifies the beginning of LLM outputs with ``Let's put ourselves in A's shoes.'', where A denotes the target character's name. We evaluate SoO prefilling on two benchmarks that assess ToM in conversational and narrative contexts without changes in the world state and find that it consistently improves ToM across five categories of mental states. Our analysis suggests that SoO prefilling elicits faithful thoughts, thereby improving the ToM performance.

Let's Put Ourselves in Sally's Shoes: Shoes-of-Others Prefilling Improves Theory of Mind in Large Language Models

TL;DR

The paper introduces Shoes-of-Others (SoO) prefilling, an inference-time method that prefixes LLM outputs with a perspective-taking prompt to improve Theory of Mind without fine-tuning. SoO prefilling demonstrates consistent gains across five mental-state categories on two ToM benchmarks (ToMATO and ToMBench), outperforming CoT-based and prompting baselines, and remains effective across multiple model families. Analyses show improvements arise from increased faithfulness of the model's intermediate thoughts rather than mere longer reasoning, and the benefits persist without relying on extended compute. The work connects explicit perspective-taking to enhanced ToM performance and discusses implications for ASD-related research, while acknowledging limitations in scope, potential biases, and ethical considerations.

Abstract

Recent studies have shown that Theory of Mind (ToM) in large language models (LLMs) has not reached human-level performance yet. Since fine-tuning LLMs on ToM datasets often degrades their generalization, several inference-time methods have been proposed to enhance ToM in LLMs. However, existing inference-time methods for ToM are specialized for inferring beliefs from contexts involving changes in the world state. In this study, we present a new inference-time method for ToM, Shoes-of-Others (SoO) prefilling, which makes fewer assumptions about contexts and is applicable to broader scenarios. SoO prefilling simply specifies the beginning of LLM outputs with ``Let's put ourselves in A's shoes.'', where A denotes the target character's name. We evaluate SoO prefilling on two benchmarks that assess ToM in conversational and narrative contexts without changes in the world state and find that it consistently improves ToM across five categories of mental states. Our analysis suggests that SoO prefilling elicits faithful thoughts, thereby improving the ToM performance.

Paper Structure

This paper contains 35 sections, 12 figures, 12 tables.

Figures (12)

  • Figure 1: Shoes-of-Others prefilling specifies the beginning of outputs and then LLMs generate the continuation. The above example from ToMATO shinoda2025tomato illustrates that Shoes-of-Others encourages the generation of faithful thoughts (i.e., the reasoning process accurately explains its prediction), thereby improving performance. See §\ref{['sec:analysis']} for in-depth analyses.
  • Figure 2: Correlation analysis of accuracy and faithfulness for Llama-3-8B-Instruct. The correlation between the two is positive on both benchmarks.
  • Figure 3: Correlation analysis of accuracy and thought length for Llama-3-8B-Instruct. The correlation between the two is not necessarily positive.
  • Figure 4: Distributions of the number of tokens in thoughts generated with CoT and SoO.
  • Figure 5: Statistical token-level correlation analysis in thoughts generated by Llama3 7B.
  • ...and 7 more figures