Transferable and Efficient Non-Factual Content Detection via Probe Training with Offline Consistency Checking
Xiaokang Zhang, Zijun Yao, Jing Zhang, Kaifeng Yun, Jifan Yu, Juanzi Li, Jie Tang
TL;DR
PiNose tackles non-factual content in LLMs by learning from offline consistency signals rather than relying on labor-intensive human labels. It uses a three-stage pipeline—data preparation with question bootstrapping, offline consistency checking to generate pseudo labels, and probe construction that operates on internal LLM representations—to create a transferable factuality detector that performs well across diverse data distributions while avoiding online multi-pass inference. The method achieves substantial gains over supervised probing baselines and outperforms self-consistency baselines in QA variation tasks, demonstrating strong cross-model transferability and improved efficiency. This approach offers a scalable path toward more trustworthy LLM outputs, with practical impact for deployment and evaluation in settings with limited labeled data and heterogeneous distributions.
Abstract
Detecting non-factual content is a longstanding goal to increase the trustworthiness of large language models (LLMs) generations. Current factuality probes, trained using humanannotated labels, exhibit limited transferability to out-of-distribution content, while online selfconsistency checking imposes extensive computation burden due to the necessity of generating multiple outputs. This paper proposes PINOSE, which trains a probing model on offline self-consistency checking results, thereby circumventing the need for human-annotated data and achieving transferability across diverse data distributions. As the consistency check process is offline, PINOSE reduces the computational burden of generating multiple responses by online consistency verification. Additionally, it examines various aspects of internal states prior to response decoding, contributing to more effective detection of factual inaccuracies. Experiment results on both factuality detection and question answering benchmarks show that PINOSE achieves surpassing results than existing factuality detection methods. Our code and datasets are publicly available on this anonymized repository.
