Machine Generated Text: A Comprehensive Survey of Threat Models and Detection Methods
Evan Crothers, Nathalie Japkowicz, Herna Viktor
TL;DR
This survey comprehensively maps the landscape of machine-generated text threats and defenses, linking threat modeling with detection to promote trustworthy AI. It provides a dual lens on methods: feature-based and neural detectors, plus domain-specific applications and human-in-the-loop approaches, while highlighting prompt injection as a new vulnerability. The work highlights open problems in robustness, fairness, attribution, and policy, arguing for cross-disciplinary collaboration to defend against widespread, accessible NLG abuses. Overall, it emphasizes that effective defenses must combine technical detection with sociotechnical governance to realize the benefits of powerful NLG systems without amplifying harms.
Abstract
Machine generated text is increasingly difficult to distinguish from human authored text. Powerful open-source models are freely available, and user-friendly tools that democratize access to generative models are proliferating. ChatGPT, which was released shortly after the first edition of this survey, epitomizes these trends. The great potential of state-of-the-art natural language generation (NLG) systems is tempered by the multitude of avenues for abuse. Detection of machine generated text is a key countermeasure for reducing abuse of NLG models, with significant technical challenges and numerous open problems. We provide a survey that includes both 1) an extensive analysis of threat models posed by contemporary NLG systems, and 2) the most complete review of machine generated text detection methods to date. This survey places machine generated text within its cybersecurity and social context, and provides strong guidance for future work addressing the most critical threat models, and ensuring detection systems themselves demonstrate trustworthiness through fairness, robustness, and accountability.
