Autonomous and Self-Adapting System for Synthetic Media Detection and Attribution
Aref Azizpour, Tai D. Nguyen, Matthew C. Stamm
TL;DR
This work tackles the problem that static synthetic-media detectors quickly degrade as new generative models appear. It proposes an autonomous pipeline that combines open-set detection with an evolvable embedding space, unsupervised clustering of unknowns for new-source discovery, and automated update/validation to sustain performance over time. Key components include Forensic Self-Description features projected to enhanced embeddings via a loss ${oldsymbol{L}_{ ext{embed}} = oldsymbol{L}_{ ext{att}} + oldsymbol{ mb}$ with ${oldsymbol{ mb}}$ representing the preservation term, and source distributions modeled by $p(oldsymbol{ ho}|s) = ext{GMM}(oldsymbol{ ho}| ig\{oldsymbol{ au}_{s,i}, oldsymbol{ u}_{s,i}, oldsymbol{ heta}_{s,i}ig floor_{i=1}^{M_s})$; new sources are discovered through DBSCAN clustering of unknowns and validated through automated update steps that ensure no performance degradation. Experimental results show superior detection and attribution performance as generators continuously emerge, outperforming static baselines and illustrating the practicality of autonomous forensic systems in rapidly evolving generative landscapes.
Abstract
Rapid advances in generative AI have enabled the creation of highly realistic synthetic images, which, while beneficial in many domains, also pose serious risks in terms of disinformation, fraud, and other malicious applications. Current synthetic image identification systems are typically static, relying on feature representations learned from known generators; as new generative models emerge, these systems suffer from severe performance degradation. In this paper, we introduce the concept of an autonomous self-adaptive synthetic media identification system -- one that not only detects synthetic images and attributes them to known sources but also autonomously identifies and incorporates novel generators without human intervention. Our approach leverages an open-set identification strategy with an evolvable embedding space that distinguishes between known and unknown sources. By employing an unsupervised clustering method to aggregate unknown samples into high-confidence clusters and continuously refining its decision boundaries, our system maintains robust detection and attribution performance even as the generative landscape evolves. Extensive experiments demonstrate that our method significantly outperforms existing approaches, marking a crucial step toward universal, adaptable forensic systems in the era of rapidly advancing generative models.
