Table of Contents
Fetching ...

Kongzi: A Historical Large Language Model with Fact Enhancement

Jiashu Yang, Ningning Wang, Yian Zhao, Chaoran Feng, Junjia Du, Hao Pang, Zhirui Fang, Xuxin Cheng

TL;DR

Kongzi addresses the challenge of factual inaccuracies in long historical reasoning by training a domain-adapted LLM on curated classical Chinese corpora and integrating a fact-aware reinforcement learning stage. The method combines continued pre-training, two-stage chain-of-thought–oriented supervised fine-tuning, and a Generalized Reward Policy Optimization objective with an entity-level factuality reward to ground long-form narratives in verified facts. Empirical results on historical question answering and narrative generation show Kongzi surpassing prior models in both factual accuracy and reasoning depth, with the 7B version achieving strong performance and RL-enhanced variants yielding additional gains. This work demonstrates the viability of domain-specific RL paradigms with factual constraints to improve reliability in professional, knowledge-intensive domains like history.

Abstract

The capabilities of the latest large language models (LLMs) have been extended from pure natural language understanding to complex reasoning tasks. However, current reasoning models often exhibit factual inaccuracies in longer reasoning chains, which poses challenges for historical reasoning and limits the potential of LLMs in complex, knowledge-intensive tasks. Historical studies require not only the accurate presentation of factual information but also the ability to establish cross-temporal correlations and derive coherent conclusions from fragmentary and often ambiguous sources. To address these challenges, we propose Kongzi, a large language model specifically designed for historical analysis. Through the integration of curated, high-quality historical data and a novel fact-reinforcement learning strategy, Kongzi demonstrates strong factual alignment and sophisticated reasoning depth. Extensive experiments on tasks such as historical question answering and narrative generation demonstrate that Kongzi outperforms existing models in both factual accuracy and reasoning depth. By effectively addressing the unique challenges inherent in historical texts, Kongzi sets a new standard for the development of accurate and reliable LLMs in professional domains.

Kongzi: A Historical Large Language Model with Fact Enhancement

TL;DR

Kongzi addresses the challenge of factual inaccuracies in long historical reasoning by training a domain-adapted LLM on curated classical Chinese corpora and integrating a fact-aware reinforcement learning stage. The method combines continued pre-training, two-stage chain-of-thought–oriented supervised fine-tuning, and a Generalized Reward Policy Optimization objective with an entity-level factuality reward to ground long-form narratives in verified facts. Empirical results on historical question answering and narrative generation show Kongzi surpassing prior models in both factual accuracy and reasoning depth, with the 7B version achieving strong performance and RL-enhanced variants yielding additional gains. This work demonstrates the viability of domain-specific RL paradigms with factual constraints to improve reliability in professional, knowledge-intensive domains like history.

Abstract

The capabilities of the latest large language models (LLMs) have been extended from pure natural language understanding to complex reasoning tasks. However, current reasoning models often exhibit factual inaccuracies in longer reasoning chains, which poses challenges for historical reasoning and limits the potential of LLMs in complex, knowledge-intensive tasks. Historical studies require not only the accurate presentation of factual information but also the ability to establish cross-temporal correlations and derive coherent conclusions from fragmentary and often ambiguous sources. To address these challenges, we propose Kongzi, a large language model specifically designed for historical analysis. Through the integration of curated, high-quality historical data and a novel fact-reinforcement learning strategy, Kongzi demonstrates strong factual alignment and sophisticated reasoning depth. Extensive experiments on tasks such as historical question answering and narrative generation demonstrate that Kongzi outperforms existing models in both factual accuracy and reasoning depth. By effectively addressing the unique challenges inherent in historical texts, Kongzi sets a new standard for the development of accurate and reliable LLMs in professional domains.

Paper Structure

This paper contains 22 sections, 3 equations, 12 figures, 3 tables.

Figures (12)

  • Figure 1: Overview of our pipeline. In terms of data processing, we clean the original corpus and separate it into factual data and inference text, where factual data undergoes augmentation while inference text is chunked for generating high-quality CoT data. For model training, we first conduct CPT with basic data, followed by a two-stage SFT to develop both fundamental QA capabilities and CoT reasoning abilities. Finally, Content RL is employed to optimize output quality through reinforcement learning.
  • Figure 2: Comparison of Kongzi's scores with other models under different judges. The model score is a weighted sum of the following components: the score for the model's response process (the average score of Historical Accuracy, Logical Reasoning, and Problem Solving, with a weight of 0.8), the score for the thought process (with a weight of 0.1), and the ratio of the model's responses that outperform the Deepseek-r1 sample (with a weight of 0.1).
  • Figure 3: Spatial-temporal relationship comprehension test, the results of Deepseek.
  • Figure 4: Spatial-temporal relationship comprehension test, the results of Deepseek.
  • Figure 5: Spatial-temporal relationship comprehension test, the results of QWQ.
  • ...and 7 more figures