Table of Contents
Fetching ...

PHAnToM: Persona-based Prompting Has An Effect on Theory-of-Mind Reasoning in Large Language Models

Fiona Anting Tan, Gerard Christopher Yeo, Kokil Jaidka, Fanyou Wu, Weijie Xu, Vinija Jain, Aman Chadha, Yang Liu, See-Kiong Ng

TL;DR

This study empirically evaluates how role-playing persona-based prompting influences Theory-of-Mind (ToM) reasoning capabilities and found that, beyond the inherent variance in the complexity of reasoning tasks, ToM performance differences arise because of socially-motivated prompting differences.

Abstract

The use of LLMs in natural language reasoning has shown mixed results, sometimes rivaling or even surpassing human performance in simpler classification tasks while struggling with social-cognitive reasoning, a domain where humans naturally excel. These differences have been attributed to many factors, such as variations in prompting and the specific LLMs used. However, no reasons appear conclusive, and no clear mechanisms have been established in prior work. In this study, we empirically evaluate how role-playing prompting influences Theory-of-Mind (ToM) reasoning capabilities. Grounding our rsearch in psychological theory, we propose the mechanism that, beyond the inherent variance in the complexity of reasoning tasks, performance differences arise because of socially-motivated prompting differences. In an era where prompt engineering with role-play is a typical approach to adapt LLMs to new contexts, our research advocates caution as models that adopt specific personas might potentially result in errors in social-cognitive reasoning.

PHAnToM: Persona-based Prompting Has An Effect on Theory-of-Mind Reasoning in Large Language Models

TL;DR

This study empirically evaluates how role-playing persona-based prompting influences Theory-of-Mind (ToM) reasoning capabilities and found that, beyond the inherent variance in the complexity of reasoning tasks, ToM performance differences arise because of socially-motivated prompting differences.

Abstract

The use of LLMs in natural language reasoning has shown mixed results, sometimes rivaling or even surpassing human performance in simpler classification tasks while struggling with social-cognitive reasoning, a domain where humans naturally excel. These differences have been attributed to many factors, such as variations in prompting and the specific LLMs used. However, no reasons appear conclusive, and no clear mechanisms have been established in prior work. In this study, we empirically evaluate how role-playing prompting influences Theory-of-Mind (ToM) reasoning capabilities. Grounding our rsearch in psychological theory, we propose the mechanism that, beyond the inherent variance in the complexity of reasoning tasks, performance differences arise because of socially-motivated prompting differences. In an era where prompt engineering with role-play is a typical approach to adapt LLMs to new contexts, our research advocates caution as models that adopt specific personas might potentially result in errors in social-cognitive reasoning.
Paper Structure (29 sections, 9 figures, 4 tables)

This paper contains 29 sections, 9 figures, 4 tables.

Figures (9)

  • Figure 1: Overview of PHAnToM. Our work investigates how eight different persona-based prompts (Big Five OCEAN and Dark Triad) affects LLMs' ability to perform three theory-of-mind reasoning tasks (Information Access (IA), Answerability (AA), and Belief Understanding (BU)).
  • Figure 2: Heatmap of MPI120 scores for the Big Five OCEAN traits (x-axis) when models are prompted with different personalities (y-axis). Scores range from 0 (Blue) to 5 (Red).
  • Figure 3: Median performance change across models, when compared to models' baseline performances without persona-based prompting.
  • Figure 4: Sensitivity of models to persona-based prompts for Answerability Task.
  • Figure 5: Cumulative effects of personality traits on model performance for the Answerability task. The values are normalized using z-scores.
  • ...and 4 more figures