BaiJia: A Large-Scale Role-Playing Agent Corpus of Chinese Historical Characters
Ting Bai, Jiazheng Kang, Jiayang Fan
TL;DR
This work tackles the data scarcity problem in historical character role-playing by introducing BaiJia, a large-scale, low-resource corpus of Chinese historical figures spanning five dynasties. It implements a three-part dataset pipeline—resume collection, dialogue generation, and question construction—to create multi-modal, knowledge-rich profiles that enable effective SFT and RP for LLMs, with 19,281 characters and 15 resume categories. An evaluation benchmark across six RP dimensions (plus three new ones for depth) demonstrates that incorporating BaiJia data significantly improves performance across diverse models, including general LLMs and RP-focused LLMs; ablation and case studies further validate the utility of both resumes and generated dialogues. Overall, BaiJia provides a foundational resource for low-resource historical AI, enabling more coherent, culturally aware, and historically grounded interactions with Chinese historical figures and supporting future research in historical knowledge-grounded RP on LLMs.
Abstract
We introduce a comprehensive large-scale role-playing agent corpus, termed BaiJia, that comprises various Chinese historical characters. This corpus is noteworthy for being the pioneering compilation of low-resource data that can be utilized in large language models (LLMs) to engage in AI-driven historical role-playing agents. BaiJia addresses the challenges in terms of fragmented historical textual records in different forms and modalities, integrating various characters' information, including their biographical, literary, family relations, historical events, and so on. We conduct extensive experiments to demonstrate the effectiveness of our BaiJia agent corpus in bolstering the role-playing abilities of various foundational LLMs, and promoting the development and assessment of LLMs in the context of historical role-playing tasks. The agent corpus is available at baijia.online.
