Persona is a Double-edged Sword: Mitigating the Negative Impact of Role-playing Prompts in Zero-shot Reasoning Tasks

Junseok Kim; Nakyeong Yang; Kyomin Jung

Persona is a Double-edged Sword: Mitigating the Negative Impact of Role-playing Prompts in Zero-shot Reasoning Tasks

Junseok Kim, Nakyeong Yang, Kyomin Jung

TL;DR

This work investigates the paradoxical effects of role-playing prompts on zero-shot reasoning and introduces Jekyll & Hyde, a robust framework that automatically generates LLM personas, runs dual solvers with and without personas, and uses a bias-mitigated evaluator to select the better solution. The approach consistently improves reasoning across twelve diverse datasets and multiple backbone models, while reducing sensitivity to persona quality and position bias. Key contributions include automatic persona generation, dual-perspective solving, and a consistency-based evaluation protocol that approaches oracle performance in many cases. The framework highlights the potential of ensemble prompting to strengthen LLM reasoning while providing practical guidance on maintaining stability and efficiency in human-AI systems.

Abstract

Recent studies demonstrate that prompting a role-playing persona to an LLM improves reasoning capability. However, assigning an adequate persona is difficult since LLMs are extremely sensitive to assigned prompts; thus, inaccurately defined personas sometimes hinder LLMs and degrade their reasoning capabilities. In this paper, we first investigate the potential negative impact of injecting persona into language models. Furthermore, we propose a novel framework, Jekyll \& Hyde, which ensembles the outcomes of both role-playing and neutral prompts to enhance the robustness of reasoning ability. Specifically, Jekyll \& Hyde predicts an appropriate persona using an LLM when defining the role-playing prompt. Then, Jekyll \& Hyde collects two potential solutions from role-playing and neutral prompts and selects a better solution using the LLM evaluator. The experimental analysis demonstrates that role-playing prompts sometimes distract LLMs, degrading their reasoning abilities in 7 out of 12 datasets in llama3. Meanwhile, Jekyll \& Hyde improve reasoning capabilities by selecting better choices among the potential solutions on twelve widely-used natural language reasoning datasets. In addition, we reveal that assigning LLM-generated personas obtains more stable results than handcrafted personas.

Persona is a Double-edged Sword: Mitigating the Negative Impact of Role-playing Prompts in Zero-shot Reasoning Tasks

TL;DR

Abstract

Paper Structure (40 sections, 3 equations, 7 figures, 16 tables)

This paper contains 40 sections, 3 equations, 7 figures, 16 tables.

Introduction
Related Works
Role-playing Abilities of LLMs
Analysis on Role-playing Prompts
Methods
Automatic Identification of Persona
Generating Personated and Neutral Perspective Solutions
Aggregating Solutions of Two Solvers
Robust Evaluation via Consistency Verification
Experiments
Experimental Setup
Datasets.
Models.
Implementation Details.
Persona does not always improve the performance of an LLM
...and 25 more sections

Figures (7)

Figure 1: Persona is a Double-edged Sword. Prior studies show that assigning a role-playing prompt to an LLM improves its performance; however, the example shows that prompting the persona to an LLM sometimes leads to deriving the wrong answer. Given a mathematical problem related to civil engineering, the following example uses "Civil Engineer" as a persona, leading the LLM to derive the wrong answer.
Figure 2: The architecture of Jekyll & Hyde. Jekyll & Hyde utilizes not only persona-assigned LLM (Persona Solver) but also LLM without prompting (Neutral Solver), which provides a dual perspective towards the given question. This structure improves the model to gain potentially high performance. After executing both LLMs, a robust Evaluator, designed to mitigate positional bias, selects a better solution between the two.
Figure 3: Win rate for each category of datasets Utilizing role-playing prompts occasionally degrades the performance of an LLM, making it difficult to determine their overall effectiveness.
Figure 4: Hyper-parameters Experiments. Variation of averaged accuracy with a (a) various number of max attempt $k$ and (b) temperature of the LLM $\tau$ used in LLM evaluator. X and Y axes correspond to each hyper-parameter setting and accuracy, respectively.
Figure 5: an entire process of how Solver works
...and 2 more figures

Persona is a Double-edged Sword: Mitigating the Negative Impact of Role-playing Prompts in Zero-shot Reasoning Tasks

TL;DR

Abstract

Persona is a Double-edged Sword: Mitigating the Negative Impact of Role-playing Prompts in Zero-shot Reasoning Tasks

Authors

TL;DR

Abstract

Table of Contents

Figures (7)