CiteCheck: Towards Accurate Citation Faithfulness Detection

Ziyao Xu; Shaohang Wei; Zhuoheng Han; Jing Jin; Zhe Yang; Xiaoguang Li; Haochen Tan; Zhijiang Guo; Houfeng Wang

CiteCheck: Towards Accurate Citation Faithfulness Detection

Ziyao Xu, Shaohang Wei, Zhuoheng Han, Jing Jin, Zhe Yang, Xiaoguang Li, Haochen Tan, Zhijiang Guo, Houfeng Wang

TL;DR

CiteCheck tackles the challenge of citation faithfulness detection in Chinese RAG systems by introducing the first large-scale Chinese dataset built via a cost-efficient two-stage annotation workflow. The authors combine question collection from diverse sources, GPT-4o-assisted data augmentation to generate high-quality negative samples, and careful manual validation to produce balanced training and challenging test sets. Zero-shot evaluations reveal the difficulty of detecting unsupported citations for state-of-the-art LLMs, while parameter-efficient fine-tuning on smaller models achieved strong performance thanks to the augmented training data. This work provides a practical foundation for reliable, citation-grounded Chinese RAG applications and offers a scalable methodology for dataset construction in low-resource language settings.

Abstract

Citation faithfulness detection is critical for enhancing retrieval-augmented generation (RAG) systems, yet large-scale Chinese datasets for this task are scarce. Existing methods face prohibitive costs due to the need for manually annotated negative samples. To address this, we introduce the first large-scale Chinese dataset CiteCheck for citation faithfulness detection, constructed via a cost-effective approach using two-stage manual annotation. This method balances positive and negative samples while significantly reducing annotation expenses. CiteCheck comprises training and test splits. Experiments demonstrate that: (1) the test samples are highly challenging, with even state-of-the-art LLMs failing to achieve high accuracy; and (2) training data augmented with LLM-generated negative samples enables smaller models to attain strong performance using parameter-efficient fine-tuning. CiteCheck provides a robust foundation for advancing citation faithfulness detection in Chinese RAG systems. The dataset is publicly available to facilitate research.

CiteCheck: Towards Accurate Citation Faithfulness Detection

TL;DR

Abstract

CiteCheck: Towards Accurate Citation Faithfulness Detection

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (1)