Table of Contents
Fetching ...

TAB-AUDIT: Detecting AI-Fabricated Scientific Tables via Multi-View Likelihood Mismatch

Shuo Huang, Yan Pen, Lizhen Qu

Abstract

AI-generated fabricated scientific manuscripts raise growing concerns with large-scale breaches of academic integrity. In this work, we present the first systematic study on detecting AI-generated fabricated scientific tables in empirical NLP papers, as information in tables serve as critical evidence for claims. We construct FabTab, the first benchmark dataset of fabricated manuscripts with tables, comprising 1,173 AI-generated papers and 1,215 human-authored ones in empirical NLP. Through a comprehensive analysis, we identify systematic differences between fabricated and real tables and operationalize them into a set of discriminative features within the TAB-AUDIT framework. The key feature, within-table mismatch, captures the perplexity gap between a table's skeleton and its numerical content. Experimental results show that RandomForest built on these features significantly outperform prior state-of-the-art methods, achieving 0.987 AUROC in-domain and 0.883 AUROC out-of-domain. Our findings highlight experimental tables as a critical forensic signal for detecting AI-generated scientific fraud and provide a new benchmark for future research.

TAB-AUDIT: Detecting AI-Fabricated Scientific Tables via Multi-View Likelihood Mismatch

Abstract

AI-generated fabricated scientific manuscripts raise growing concerns with large-scale breaches of academic integrity. In this work, we present the first systematic study on detecting AI-generated fabricated scientific tables in empirical NLP papers, as information in tables serve as critical evidence for claims. We construct FabTab, the first benchmark dataset of fabricated manuscripts with tables, comprising 1,173 AI-generated papers and 1,215 human-authored ones in empirical NLP. Through a comprehensive analysis, we identify systematic differences between fabricated and real tables and operationalize them into a set of discriminative features within the TAB-AUDIT framework. The key feature, within-table mismatch, captures the perplexity gap between a table's skeleton and its numerical content. Experimental results show that RandomForest built on these features significantly outperform prior state-of-the-art methods, achieving 0.987 AUROC in-domain and 0.883 AUROC out-of-domain. Our findings highlight experimental tables as a critical forensic signal for detecting AI-generated scientific fraud and provide a new benchmark for future research.
Paper Structure (45 sections, 12 equations, 5 figures, 12 tables)

This paper contains 45 sections, 12 equations, 5 figures, 12 tables.

Figures (5)

  • Figure 1: Compact illustration of the table skeleton--numeric mismatch.
  • Figure 2: Representative paper-level distributions for three observer-based indicators: skeleton likelihood (left), digit-level numeric likelihood (center), and the within-table numeric--skeleton mismatch (right). The key pattern is not merely that fabricated tables can have unusual numbers in isolation, but that their numbers are atypical relative to an otherwise conventional scientific skeleton.
  • Figure 3: Paper-context sensitivity of table perplexity. Empirical CDF of $\Delta \log \mathrm{PPL} = \log \mathrm{PPL}_{ctx}-\log \mathrm{PPL}_{only}$, where $\mathrm{PPL}_{ctx}$ scores table tokens conditioned on the preceding paper content (prefix masked from loss), and $\mathrm{PPL}_{only}$ scores the same table tokens in isolation. Negative values indicate that paper context makes the table more predictable under the observer.
  • Figure 4: Literature-grounded AI paper generation pipeline used to construct the fabricated-paper benchmark.
  • Figure 5: Table Authenticity Judgment Form.