Explainable Artifacts for Synthetic Western Blot Source Attribution
João Phillipe Cardenuto, Sara Mandelli, Daniel Moreira, Paolo Bestagini, Edward Delp, Anderson Rocha
TL;DR
The paper tackles the problem of detecting AI-generated Western blot images and attributing their source models to counter paper mills. It introduces explainable artifacts derived from residual noise, patch-based Fourier analysis (PATCH-FFT-PEAKS), and Fourier-based texture features (FFT-GLCM), combined with selective residual-noise extraction methods to detect synthetic content and identify generator models. Through closed-set, open-set, and one-vs-rest experiments on authentic and synthetic Western blots, the approach demonstrates strong performance for attribution using hand-crafted features, with open-set results showing robustness where deep-learning features lag. The work provides practical forensic tools for provenance analysis in biomedical imagery and suggests directions for extending attribution to additional models and image types, aiding scientific integrity and anti-mills efforts.
Abstract
Recent advancements in artificial intelligence have enabled generative models to produce synthetic scientific images that are indistinguishable from pristine ones, posing a challenge even for expert scientists habituated to working with such content. When exploited by organizations known as paper mills, which systematically generate fraudulent articles, these technologies can significantly contribute to the spread of misinformation about ungrounded science, potentially undermining trust in scientific research. While previous studies have explored black-box solutions, such as Convolutional Neural Networks, for identifying synthetic content, only some have addressed the challenge of generalizing across different models and providing insight into the artifacts in synthetic images that inform the detection process. This study aims to identify explainable artifacts generated by state-of-the-art generative models (e.g., Generative Adversarial Networks and Diffusion Models) and leverage them for open-set identification and source attribution (i.e., pointing to the model that created the image).
