Table of Contents
Fetching ...

Memory-Driven Role-Playing: Evaluation and Enhancement of Persona Knowledge Utilization in LLMs

Kai Wang, Haoyang You, Yang Zhang, Zhongjie Wang

Abstract

A core challenge for faithful LLM role-playing is sustaining consistent characterization throughout long, open-ended dialogues, as models frequently fail to recall and accurately apply their designated persona knowledge without explicit cues. To tackle this, we propose the Memory-Driven Role-Playing paradigm. Inspired by Stanislavski's "emotional memory" acting theory, this paradigm frames persona knowledge as the LLM's internal memory store, requiring retrieval and application based solely on dialogue context, thereby providing a rigorous test of depth and autonomous use of knowledge. Centered on this paradigm, we contribute: (1) MREval, a fine-grained evaluation framework assessing four memory-driven abilities - Anchoring, Recalling, Bounding, and Enacting; (2) MRPrompt, a prompting architecture that guides structured memory retrieval and response generation; and (3) MRBench, a bilingual (Chinese/English) benchmark for fine-grained diagnosis. The novel paradigm provides a comprehensive diagnostic for four-staged role-playing abilities across 12 LLMs. Crucially, experiments show that MRPrompt allows small models (e.g., Qwen3-8B) to match the performance of much larger closed-source LLMs (e.g., Qwen3-Max and GLM-4.7), and confirms that upstream memory gains directly enhance downstream response quality, validating the staged theoretical foundation.

Memory-Driven Role-Playing: Evaluation and Enhancement of Persona Knowledge Utilization in LLMs

Abstract

A core challenge for faithful LLM role-playing is sustaining consistent characterization throughout long, open-ended dialogues, as models frequently fail to recall and accurately apply their designated persona knowledge without explicit cues. To tackle this, we propose the Memory-Driven Role-Playing paradigm. Inspired by Stanislavski's "emotional memory" acting theory, this paradigm frames persona knowledge as the LLM's internal memory store, requiring retrieval and application based solely on dialogue context, thereby providing a rigorous test of depth and autonomous use of knowledge. Centered on this paradigm, we contribute: (1) MREval, a fine-grained evaluation framework assessing four memory-driven abilities - Anchoring, Recalling, Bounding, and Enacting; (2) MRPrompt, a prompting architecture that guides structured memory retrieval and response generation; and (3) MRBench, a bilingual (Chinese/English) benchmark for fine-grained diagnosis. The novel paradigm provides a comprehensive diagnostic for four-staged role-playing abilities across 12 LLMs. Crucially, experiments show that MRPrompt allows small models (e.g., Qwen3-8B) to match the performance of much larger closed-source LLMs (e.g., Qwen3-Max and GLM-4.7), and confirms that upstream memory gains directly enhance downstream response quality, validating the staged theoretical foundation.
Paper Structure (91 sections, 1 equation, 19 figures, 21 tables)

This paper contains 91 sections, 1 equation, 19 figures, 21 tables.

Figures (19)

  • Figure 1: Three Issues in LLM Role-Playing Paradigm
  • Figure 2: Overview. Given two parts of memory, an LLM performs memory-driven role-playing via four stages to generate an in-character response. MRPrompt provides structured persona memory and a memory utilization protocol. MREval scores eight stage-aligned metrics on the bilingual benchmark MRBench, by using an LLM-as-a-judge to assign per-metric scores.
  • Figure 3: MRPrompt. The raw persona description is structured as LTM via Narrative Schema and provided together with STM for role-playing. Magic-If Protocol guides an LLM to generate responses following four stages.
  • Figure 4: All-model radar profiles on MRBench (MRPrompt). Eight-axis MREval metric profiles for English and Chinese with a shared legend.
  • Figure 5: Ability correlations. Heatmap of Pearson correlations ($r$) between ability-level scores over multiple models.
  • ...and 14 more figures