Revealing and Mitigating the Challenge of Detecting Character Knowledge Errors in LLM Role-Playing

Wenyuan Zhang; Shuaiyi Nie; Jiawei Sheng; Zefeng Zhang; Xinghua Zhang; Yongquan He; Tingwen Liu

Revealing and Mitigating the Challenge of Detecting Character Knowledge Errors in LLM Role-Playing

Wenyuan Zhang, Shuaiyi Nie, Jiawei Sheng, Zefeng Zhang, Xinghua Zhang, Yongquan He, Tingwen Liu

TL;DR

This work formalizes the challenge of detecting character knowledge errors in LLM-based role-playing via the RoleKE-Bench benchmark, which targets known (KKE) and unknown (UKE) errors across four memory types. It reveals that current models struggle to detect these errors, with maximum accuracy around 65%, and that KKE is particularly difficult. To address this, the authors propose S^2RD, an agent-based reasoning framework combining Self-Recollection and Self-Doubt to improve error detection, achieving sizable gains over strong baselines. The findings highlight the need to integrate error-detection into automatic corpus construction and model training to build safer, more faithful role-playing agents.

Abstract

Large language model (LLM) role-playing has gained widespread attention. Authentic character knowledge is crucial for constructing realistic LLM role-playing agents. However, existing works usually overlook the exploration of LLMs' ability to detect characters' known knowledge errors (KKE) and unknown knowledge errors (UKE) while playing roles, which would lead to low-quality automatic construction of character trainable corpus. In this paper, we propose RoleKE-Bench to evaluate LLMs' ability to detect errors in KKE and UKE. The results indicate that even the latest LLMs struggle to detect these two types of errors effectively, especially when it comes to familiar knowledge. We experimented with various reasoning strategies and propose an agent-based reasoning method, Self-Recollection and Self-Doubt (S$^2$RD), to explore further the potential for improving error detection capabilities. Experiments show that our method effectively improves the LLMs' ability to detect error character knowledge, but it remains an issue that requires ongoing attention.

Revealing and Mitigating the Challenge of Detecting Character Knowledge Errors in LLM Role-Playing

TL;DR

Abstract

RD), to explore further the potential for improving error detection capabilities. Experiments show that our method effectively improves the LLMs' ability to detect error character knowledge, but it remains an issue that requires ongoing attention.

Paper Structure (33 sections, 3 equations, 6 figures, 20 tables)

This paper contains 33 sections, 3 equations, 6 figures, 20 tables.

Introduction
Related Work
Problem Formulation
Character Knowledge Taxonomy
Character Knowledge Errors
Task Definition
RoleKE-Bench
Correct Memory Generation
Erroneous Knowledge Injection
Benchmark Statistics
Methodology
Self-Recollection
Self-Doubt
Evaluation
Setting and Metrics
...and 18 more sections

Figures (6)

Figure 1: The real responses of GPT-3.5-turbo-0125 while playing Isaac Newton revealed some inconsistencies. In (a), although the LLM denied that Marie Curie was a scientist from Newton's time, it still showed an undue familiarity with her, exceeding the character's knowledge boundaries. In (b), the LLM incorrectly attributed the invention of the microscope, which was created before Newton's birth, to the wrong inventor.
Figure 2: Overview of Probing Dataset construction. First, we create correct character memories, which encompass the knowledge that the character should proficiently possess. Second, we inject erroneous knowledge, simulating both types of errors and preserving the modification details, which results in final queries.
Figure 3: Overview of S$^2$RD. First, the model restates the character based on the profile, and this narrative serves as input for all subsequent agents. Then, it undergoes two steps of reasoning: self-recollection and self-doubt. Finally, all results are combined into the context of the last agent to detect errors.
Figure 4: t-SNE visualization on two characters with LLaMA3-8b. For more results, refer to Figure \ref{['fig:overall']}.
Figure 5: The accuracy of the LLM judges based on human annotations.
...and 1 more figures

Revealing and Mitigating the Challenge of Detecting Character Knowledge Errors in LLM Role-Playing

TL;DR

Abstract

Revealing and Mitigating the Challenge of Detecting Character Knowledge Errors in LLM Role-Playing

Authors

TL;DR

Abstract

Table of Contents

Figures (6)