Table of Contents
Fetching ...

Transferable and Efficient Non-Factual Content Detection via Probe Training with Offline Consistency Checking

Xiaokang Zhang, Zijun Yao, Jing Zhang, Kaifeng Yun, Jifan Yu, Juanzi Li, Jie Tang

TL;DR

PiNose tackles non-factual content in LLMs by learning from offline consistency signals rather than relying on labor-intensive human labels. It uses a three-stage pipeline—data preparation with question bootstrapping, offline consistency checking to generate pseudo labels, and probe construction that operates on internal LLM representations—to create a transferable factuality detector that performs well across diverse data distributions while avoiding online multi-pass inference. The method achieves substantial gains over supervised probing baselines and outperforms self-consistency baselines in QA variation tasks, demonstrating strong cross-model transferability and improved efficiency. This approach offers a scalable path toward more trustworthy LLM outputs, with practical impact for deployment and evaluation in settings with limited labeled data and heterogeneous distributions.

Abstract

Detecting non-factual content is a longstanding goal to increase the trustworthiness of large language models (LLMs) generations. Current factuality probes, trained using humanannotated labels, exhibit limited transferability to out-of-distribution content, while online selfconsistency checking imposes extensive computation burden due to the necessity of generating multiple outputs. This paper proposes PINOSE, which trains a probing model on offline self-consistency checking results, thereby circumventing the need for human-annotated data and achieving transferability across diverse data distributions. As the consistency check process is offline, PINOSE reduces the computational burden of generating multiple responses by online consistency verification. Additionally, it examines various aspects of internal states prior to response decoding, contributing to more effective detection of factual inaccuracies. Experiment results on both factuality detection and question answering benchmarks show that PINOSE achieves surpassing results than existing factuality detection methods. Our code and datasets are publicly available on this anonymized repository.

Transferable and Efficient Non-Factual Content Detection via Probe Training with Offline Consistency Checking

TL;DR

PiNose tackles non-factual content in LLMs by learning from offline consistency signals rather than relying on labor-intensive human labels. It uses a three-stage pipeline—data preparation with question bootstrapping, offline consistency checking to generate pseudo labels, and probe construction that operates on internal LLM representations—to create a transferable factuality detector that performs well across diverse data distributions while avoiding online multi-pass inference. The method achieves substantial gains over supervised probing baselines and outperforms self-consistency baselines in QA variation tasks, demonstrating strong cross-model transferability and improved efficiency. This approach offers a scalable path toward more trustworthy LLM outputs, with practical impact for deployment and evaluation in settings with limited labeled data and heterogeneous distributions.

Abstract

Detecting non-factual content is a longstanding goal to increase the trustworthiness of large language models (LLMs) generations. Current factuality probes, trained using humanannotated labels, exhibit limited transferability to out-of-distribution content, while online selfconsistency checking imposes extensive computation burden due to the necessity of generating multiple outputs. This paper proposes PINOSE, which trains a probing model on offline self-consistency checking results, thereby circumventing the need for human-annotated data and achieving transferability across diverse data distributions. As the consistency check process is offline, PINOSE reduces the computational burden of generating multiple responses by online consistency verification. Additionally, it examines various aspects of internal states prior to response decoding, contributing to more effective detection of factual inaccuracies. Experiment results on both factuality detection and question answering benchmarks show that PINOSE achieves surpassing results than existing factuality detection methods. Our code and datasets are publicly available on this anonymized repository.
Paper Structure (29 sections, 4 equations, 8 figures, 7 tables)

This paper contains 29 sections, 4 equations, 8 figures, 7 tables.

Figures (8)

  • Figure 1: The overall architecture of PiNose.
  • Figure 2: Effects of question generation and the number of reviews and responses. We assess three question distributions for factual detection training data: "self questions" ($1,000$ questions from the training data within the same dataset), "external questions" ($5,000$ questions from a different dataset), and our proposed approach, "generated questions" (without relying on available questions). Subfigures (a)-(c) demonstrate the effects of different question distributions on various test sets, while subfigure (d) presents the effects of various $k$ (the number of responses) and $N$ (the round of reviews per response) on NQ.
  • Figure 3: AUC obtained using the internal representations of different layers at the probe construction stage.
  • Figure 4: Prompt for question generation in PiNose. Five seed questions are provided and the blank following item 6 is the new question that encourages LLMs to generate.
  • Figure 5: Prompt for response generation in PiNose. Five different instructions are randomly employed to elicit diverse responses.
  • ...and 3 more figures