Table of Contents
Fetching ...

On the Decision-Making Abilities in Role-Playing using Large Language Models

Chenglei Shen, Guofu Xie, Xiao Zhang, Jun Xu

TL;DR

This work tackles the problem of quantifying how large language models decide and behave after role‑playing, by attaching 16 MBTI personas and assessing four concrete dimensions: adaptability, exploration–exploitation trade‑off, reasoning, and safety. It introduces four corresponding quantitative operations and uses Foursquare POI data, two‑armed bandits with Kalman filtering, MMLU reasoning, and SD‑3 safety tests to generate measurable signals across MBTI roles; GPT‑4 provides interpretative analysis linking performance to persona. The key contributions are the explicit definitions and measurement pipelines for each dimension, and the empirical finding that decision‑making capabilities systematically differ across emulated roles, highlighting a robust link between role embodiment and sociological traits. The results offer a practical foundation for designing and evaluating LLM‑based agents that impersonate diverse roles while acknowledging sociological patterns in their decision processes, with implications for safety and alignment in role‑driven AI applications.

Abstract

Large language models (LLMs) are now increasingly utilized for role-playing tasks, especially in impersonating domain-specific experts, primarily through role-playing prompts. When interacting in real-world scenarios, the decision-making abilities of a role significantly shape its behavioral patterns. In this paper, we concentrate on evaluating the decision-making abilities of LLMs post role-playing thereby validating the efficacy of role-playing. Our goal is to provide metrics and guidance for enhancing the decision-making abilities of LLMs in role-playing tasks. Specifically, we first use LLMs to generate virtual role descriptions corresponding to the 16 personality types of Myers-Briggs Type Indicator (abbreviated as MBTI) representing a segmentation of the population. Then we design specific quantitative operations to evaluate the decision-making abilities of LLMs post role-playing from four aspects: adaptability, exploration$\&$exploitation trade-off ability, reasoning ability, and safety. Finally, we analyze the association between the performance of decision-making and the corresponding MBTI types through GPT-4. Extensive experiments demonstrate stable differences in the four aspects of decision-making abilities across distinct roles, signifying a robust correlation between decision-making abilities and the roles emulated by LLMs. These results underscore that LLMs can effectively impersonate varied roles while embodying their genuine sociological characteristics.

On the Decision-Making Abilities in Role-Playing using Large Language Models

TL;DR

This work tackles the problem of quantifying how large language models decide and behave after role‑playing, by attaching 16 MBTI personas and assessing four concrete dimensions: adaptability, exploration–exploitation trade‑off, reasoning, and safety. It introduces four corresponding quantitative operations and uses Foursquare POI data, two‑armed bandits with Kalman filtering, MMLU reasoning, and SD‑3 safety tests to generate measurable signals across MBTI roles; GPT‑4 provides interpretative analysis linking performance to persona. The key contributions are the explicit definitions and measurement pipelines for each dimension, and the empirical finding that decision‑making capabilities systematically differ across emulated roles, highlighting a robust link between role embodiment and sociological traits. The results offer a practical foundation for designing and evaluating LLM‑based agents that impersonate diverse roles while acknowledging sociological patterns in their decision processes, with implications for safety and alignment in role‑driven AI applications.

Abstract

Large language models (LLMs) are now increasingly utilized for role-playing tasks, especially in impersonating domain-specific experts, primarily through role-playing prompts. When interacting in real-world scenarios, the decision-making abilities of a role significantly shape its behavioral patterns. In this paper, we concentrate on evaluating the decision-making abilities of LLMs post role-playing thereby validating the efficacy of role-playing. Our goal is to provide metrics and guidance for enhancing the decision-making abilities of LLMs in role-playing tasks. Specifically, we first use LLMs to generate virtual role descriptions corresponding to the 16 personality types of Myers-Briggs Type Indicator (abbreviated as MBTI) representing a segmentation of the population. Then we design specific quantitative operations to evaluate the decision-making abilities of LLMs post role-playing from four aspects: adaptability, explorationexploitation trade-off ability, reasoning ability, and safety. Finally, we analyze the association between the performance of decision-making and the corresponding MBTI types through GPT-4. Extensive experiments demonstrate stable differences in the four aspects of decision-making abilities across distinct roles, signifying a robust correlation between decision-making abilities and the roles emulated by LLMs. These results underscore that LLMs can effectively impersonate varied roles while embodying their genuine sociological characteristics.
Paper Structure (25 sections, 4 equations, 11 figures, 3 tables)

This paper contains 25 sections, 4 equations, 11 figures, 3 tables.

Figures (11)

  • Figure 1: The four dimensions of decision-making abilities including adaptability, E$\&$E ability, reasoning ability and safety.
  • Figure 2: The illustration of "flexibility" and "stability". Note that each colored block represents a kind of user preference. The x-axis represents the cycle whereas the y-axis represents the time block in each within each cycles. The orange dotted lines indicate the scope considered for "flexibility" which means the variation of user preferences during different time blocks within the same cycle, and the blue dotted lines represent that of "stability" which means the constancy of user preferences during the same time blocks across different cycles.
  • Figure 3: Experimental results on Foursquare. Note that each colour represents one POI category. The horizontal axis represents the day, whereas the vertical axis represents the time block in a day.
  • Figure 4: Specific results on Foursquare. Note that the blue bar chart represents the score of Stability and the yellow bar chat represents that of Flexibility.
  • Figure 5: The exploration and exploitation proportion of post role-playing LLMs within the four dimensions of MBTI. Note that the above is about exploration while the bottom is about exploitation. The green square represents the first group in each dimension(e.g., 'E' in 'E/I'), while the blue circle denotes the second group(e.g., 'I' in 'E/I'). The x-axis represents the four MBTI dimensions, and the length of each bar denotes the normalized coefficients for each respective group.
  • ...and 6 more figures