An Information-Theoretic Approach for Detecting Edits in AI-Generated Text
Idan Kashtan, Alon Kipnis
TL;DR
This work addresses the challenge of detecting edits in AI-generated text when edits are sparse and their locations unknown. It introduces a two-step method that first tests each sentence with a log-perplexity-based statistic and then aggregates evidence globally using Higher Criticism to detect deviations from purely GLM authorship; it may also identify candidate edited sentences. The paper provides an information-theoretic analysis of the method’s optimality under sparse mixture models and cross-entropy principles, supported by extensive empirical evaluations on synthetic and realistic datasets, including Wikipedia-style and news articles. It concludes with practical refinements, context-based extensions, and theoretical questions about optimal detector design and the minimal edits required for reliable detection, highlighting the approach’s significance for trust, transparency, and governance of GLM-generated content.
Abstract
We propose a method to determine whether a given article was written entirely by a generative language model or perhaps contains edits by a different author, possibly a human. Our process involves multiple tests for the origin of individual sentences or other pieces of text and combining these tests using a method that is sensitive to rare alternatives, i.e., non-null effects are few and scattered across the text in unknown locations. Interestingly, this method also identifies pieces of text suspected to contain edits. We demonstrate the effectiveness of the method in detecting edits through extensive evaluations using real data and provide an information-theoretic analysis of the factors affecting its success. In particular, we discuss optimality properties under a theoretical framework for text editing saying that sentences are generated mainly by the language model, except perhaps for a few sentences that might have originated via a different mechanism. Our analysis raises several interesting research questions at the intersection of information theory and data science.
