Thinking in Character: Advancing Role-Playing Agents with Role-Aware Reasoning
Yihong Tang, Kehai Chen, Muyun Yang, Zhengyu Niu, Jing Li, Tiejun Zhao, Min Zhang
TL;DR
This work tackles the problem of RPAs lacking deep, human-like internal thinking by introducing Role-Aware Reasoning (RAR), which combines Role Identity Activation (RIA) to anchor reasoning in a character’s core traits and Reasoning Style Optimization (RSO) to adapt the thinking style to scene context. By distilling reasoning traces from a large reasoning model into a smaller LLM, RAR mitigates attention diversion and style drift, enabling more consistent and believable character portrayals. Extensive experiments on RoleBench-derived data and standard RP A benchmarks (CharacterBench and SocialBench) show that RAR improves persona fidelity, knowledge recall, and social judgment, with ablations confirming the complementary roles of RIA and RSO. The results suggest that explicit guidance of internal reasoning is a promising direction for enhancing complex generative tasks like role-playing, with potential extensions to finer character attributes, long-term memory, and larger teacher models.
Abstract
The advancement of Large Language Models (LLMs) has spurred significant interest in Role-Playing Agents (RPAs) for applications such as emotional companionship and virtual interaction. However, recent RPAs are often built on explicit dialogue data, lacking deep, human-like internal thought processes, resulting in superficial knowledge and style expression. While Large Reasoning Models (LRMs) can be employed to simulate character thought, their direct application is hindered by attention diversion (i.e., RPAs forget their role) and style drift (i.e., overly formal and rigid reasoning rather than character-consistent reasoning). To address these challenges, this paper introduces a novel Role-Aware Reasoning (RAR) method, which consists of two important stages: Role Identity Activation (RIA) and Reasoning Style Optimization (RSO). RIA explicitly guides the model with character profiles during reasoning to counteract attention diversion, and then RSO aligns reasoning style with the character and scene via LRM distillation to mitigate style drift. Extensive experiments demonstrate that the proposed RAR significantly enhances the performance of RPAs by effectively addressing attention diversion and style drift.
