Table of Contents
Fetching ...

RoleRAG: Enhancing LLM Role-Playing via Graph Guided Retrieval

Yongjie Wang, Jonathan Leung, Zhiqi Shen

TL;DR

RoleRAG tackles hallucinations in role-playing LLMs by coupling knowledge-graph–based indexing with a boundary-aware retrieval mechanism that discerns entity ambiguity and character knowledge limits. The framework normalizes entities, constructs a deduplicated knowledge graph, and retrieves both specific and general context while explicitly rejecting out-of-scope queries. Empirical results across multiple datasets show RoleRAG improves knowledge exposure and reduces hallucinations compared with strong baselines, with notable gains for less frequent characters and for general vs. specific queries. The work demonstrates a practical path toward more faithful, context-aware role-playing agents with scalable retrieval-based improvements and thoughtful evaluation strategies.

Abstract

Large Language Models (LLMs) have shown promise in character imitation, enabling immersive and engaging conversations. However, they often generate content that is irrelevant or inconsistent with a character's background. We attribute these failures to: (1) the inability to accurately recall character-specific knowledge due to entity ambiguity, and (2) a lack of awareness of the character's cognitive boundaries. To address these issues, we propose RoleRAG, a retrieval-based framework that integrates efficient entity disambiguation for knowledge indexing with a boundary-aware retriever for extracting contextually appropriate information from a structured knowledge graph. Experiments on role-playing benchmarks show that RoleRAG's calibrated retrieval helps both general-purpose and role-specific LLMs better align with character knowledge and reduce hallucinated responses.

RoleRAG: Enhancing LLM Role-Playing via Graph Guided Retrieval

TL;DR

RoleRAG tackles hallucinations in role-playing LLMs by coupling knowledge-graph–based indexing with a boundary-aware retrieval mechanism that discerns entity ambiguity and character knowledge limits. The framework normalizes entities, constructs a deduplicated knowledge graph, and retrieves both specific and general context while explicitly rejecting out-of-scope queries. Empirical results across multiple datasets show RoleRAG improves knowledge exposure and reduces hallucinations compared with strong baselines, with notable gains for less frequent characters and for general vs. specific queries. The work demonstrates a practical path toward more faithful, context-aware role-playing agents with scalable retrieval-based improvements and thoughtful evaluation strategies.

Abstract

Large Language Models (LLMs) have shown promise in character imitation, enabling immersive and engaging conversations. However, they often generate content that is irrelevant or inconsistent with a character's background. We attribute these failures to: (1) the inability to accurately recall character-specific knowledge due to entity ambiguity, and (2) a lack of awareness of the character's cognitive boundaries. To address these issues, we propose RoleRAG, a retrieval-based framework that integrates efficient entity disambiguation for knowledge indexing with a boundary-aware retriever for extracting contextually appropriate information from a structured knowledge graph. Experiments on role-playing benchmarks show that RoleRAG's calibrated retrieval helps both general-purpose and role-specific LLMs better align with character knowledge and reduce hallucinated responses.

Paper Structure

This paper contains 29 sections, 1 equation, 10 figures, 4 tables, 1 algorithm.

Figures (10)

  • Figure 1: This figure illustrates that LLMs perform worse on role-specific questions, particularly when imitating lower-frequency characters.
  • Figure 2: Workflow of our proposed RoleRAG.
  • Figure 3: Illustration of evaluation metrics. We encourage LLMs to exhibit more personal traits, minimize fabricated content, and align more closely with the boundaries of character cognition.
  • Figure 4: Experiments of out-of-scope questions in RoleBench-zh dataset.
  • Figure 5: Word cloud for responses generated by GPT-4o mini when role-playing as Harry Potter.
  • ...and 5 more figures