RAID: A Shared Benchmark for Robust Evaluation of Machine-Generated Text Detectors
Liam Dugan, Alyssa Hwang, Filip Trhlik, Josh Magnus Ludan, Andrew Zhu, Hainiu Xu, Daphne Ippolito, Chris Callison-Burch
TL;DR
RAID introduces the Robust AI Detection benchmark, the largest shared dataset for evaluating machine-generated text detectors across 11 generator models, 8 domains, 11 adversarial attacks, and 4 decoding strategies, totaling over 6.2 million generations. It benchmarks 12 detectors (8 open-source, 4 closed-source) and reveals widespread robustness gaps, including high default FPRs for open-source detectors, substantial accuracy drops with simple generation variations, and detector vulnerability to domain-, model-, and attack-specific factors. The study advocates principled evaluation practices, such as per-domain FPR calibration and reporting across decoding strategies, and provides a leaderboard and RAID-extra to foster ongoing, community-driven improvement. By releasing RAID and RAID-extra, the authors aim to accelerate robust detector research and help society manage the harms associated with machine-generated text through better tools and shared benchmarks.
Abstract
Many commercial and open-source models claim to detect machine-generated text with extremely high accuracy (99% or more). However, very few of these detectors are evaluated on shared benchmark datasets and even when they are, the datasets used for evaluation are insufficiently challenging-lacking variations in sampling strategy, adversarial attacks, and open-source generative models. In this work we present RAID: the largest and most challenging benchmark dataset for machine-generated text detection. RAID includes over 6 million generations spanning 11 models, 8 domains, 11 adversarial attacks and 4 decoding strategies. Using RAID, we evaluate the out-of-domain and adversarial robustness of 8 open- and 4 closed-source detectors and find that current detectors are easily fooled by adversarial attacks, variations in sampling strategies, repetition penalties, and unseen generative models. We release our data along with a leaderboard to encourage future research.
