TAB-AUDIT: Detecting AI-Fabricated Scientific Tables via Multi-View Likelihood Mismatch

Shuo Huang; Yan Pen; Lizhen Qu

TAB-AUDIT: Detecting AI-Fabricated Scientific Tables via Multi-View Likelihood Mismatch

Shuo Huang, Yan Pen, Lizhen Qu

Abstract

AI-generated fabricated scientific manuscripts raise growing concerns with large-scale breaches of academic integrity. In this work, we present the first systematic study on detecting AI-generated fabricated scientific tables in empirical NLP papers, as information in tables serve as critical evidence for claims. We construct FabTab, the first benchmark dataset of fabricated manuscripts with tables, comprising 1,173 AI-generated papers and 1,215 human-authored ones in empirical NLP. Through a comprehensive analysis, we identify systematic differences between fabricated and real tables and operationalize them into a set of discriminative features within the TAB-AUDIT framework. The key feature, within-table mismatch, captures the perplexity gap between a table's skeleton and its numerical content. Experimental results show that RandomForest built on these features significantly outperform prior state-of-the-art methods, achieving 0.987 AUROC in-domain and 0.883 AUROC out-of-domain. Our findings highlight experimental tables as a critical forensic signal for detecting AI-generated scientific fraud and provide a new benchmark for future research.

TAB-AUDIT: Detecting AI-Fabricated Scientific Tables via Multi-View Likelihood Mismatch

Abstract

Paper Structure (45 sections, 12 equations, 5 figures, 12 tables)

This paper contains 45 sections, 12 equations, 5 figures, 12 tables.

Introduction
Benchmark Construction and Paper Generation
Collecting Recent Experimental NLP Papers
Literature-Grounded Fabricated Paper Generation
Benchmark Details
Can human really differentiate AI-fabricated tables?
Signal Analysis: How Fabricated Tables Differ from Human Experimental Table
Why a forensic signal should exist
Multi-view likelihood evidence for numeric--skeleton mismatch
Complementary LM-free numeric signals
Implications for auditing
TAB-AUDIT: A Signal-Driven Paper-Level Auditing Framework
Problem setup
Multi-view table serialization
Likelihood-based signal extraction
...and 30 more sections

Figures (5)

Figure 1: Compact illustration of the table skeleton--numeric mismatch.
Figure 2: Representative paper-level distributions for three observer-based indicators: skeleton likelihood (left), digit-level numeric likelihood (center), and the within-table numeric--skeleton mismatch (right). The key pattern is not merely that fabricated tables can have unusual numbers in isolation, but that their numbers are atypical relative to an otherwise conventional scientific skeleton.
Figure 3: Paper-context sensitivity of table perplexity. Empirical CDF of $\Delta \log \mathrm{PPL} = \log \mathrm{PPL}_{ctx}-\log \mathrm{PPL}_{only}$, where $\mathrm{PPL}_{ctx}$ scores table tokens conditioned on the preceding paper content (prefix masked from loss), and $\mathrm{PPL}_{only}$ scores the same table tokens in isolation. Negative values indicate that paper context makes the table more predictable under the observer.
Figure 4: Literature-grounded AI paper generation pipeline used to construct the fabricated-paper benchmark.
Figure 5: Table Authenticity Judgment Form.

TAB-AUDIT: Detecting AI-Fabricated Scientific Tables via Multi-View Likelihood Mismatch

Abstract

TAB-AUDIT: Detecting AI-Fabricated Scientific Tables via Multi-View Likelihood Mismatch

Authors

Abstract

Table of Contents

Figures (5)