Evolution of Detection Performance throughout the Online Lifespan of Synthetic Images
Dimitrios Karageorgiou, Quentin Bammey, Valentin Porcellini, Bertrand Goupil, Denis Teyssou, Symeon Papadopoulos
TL;DR
This work tackles the problem of detecting AI-generated synthetic images that spread online, revealing that state-of-the-art detectors struggle in real-world conditions and deteriorate as content is repeatedly post-processed over time. It introduces the FOSID dataset to capture web-scale evolution of online misinformation and systematically evaluates a broad set of SID methods, exposing calibration gaps and degradation in the wild. To mitigate this, the authors propose Retrieval-Assisted Synthetic Image Detection (RASID), which leverages near-duplicate image retrieval to stabilize detection across the image lifespan, achieving average gains of $6.7 ext{ extpercent}$ in $BA$ and $7.8 ext{ extpercent}$ in $AUC$. The results highlight the need for evolution-aware, retrieval-informed SID approaches and provide a publicly available benchmark to drive future improvements in misinformation defense against diffusion-model–generated imagery.
Abstract
Synthetic images disseminated online significantly differ from those used during the training and evaluation of the state-of-the-art detectors. In this work, we analyze the performance of synthetic image detectors as deceptive synthetic images evolve throughout their online lifespan. Our study reveals that, despite advancements in the field, current state-of-the-art detectors struggle to distinguish between synthetic and real images in the wild. Moreover, we show that the time elapsed since the initial online appearance of a synthetic image negatively affects the performance of most detectors. Ultimately, by employing a retrieval-assisted detection approach, we demonstrate the feasibility to maintain initial detection performance throughout the whole online lifespan of an image and enhance the average detection efficacy across several state-of-the-art detectors by 6.7% and 7.8% for balanced accuracy and AUC metrics, respectively.
