Table of Contents
Fetching ...

DeepfakeArt Challenge: A Benchmark Dataset for Generative AI Art Forgery and Data Poisoning Detection

Hossein Aboutalebi, Dayou Mao, Rongqi Fan, Carol Xu, Chris He, Alexander Wong

TL;DR

This work tackles the risk of copyright infringement and data poisoning in generative AI art by introducing the DeepfakeArt Challenge, a benchmark containing over 32,000 image pairs derived from Inpainting, Style Transfer, Adversarial Poisoning, and Cutmix on WikiArt sources. It formalizes two core problems—Art Forgery and Adversarial Data Poisoning—using region-based similarity criteria with transformations and small perturbations that can alter model outputs, respectively. The dataset is designed to emulate real-world copyright violations and adversarial contamination, with comprehensive quality checks. Empirical evaluation across several embedding models reveals high false negative rates, indicating current methods struggle to reliably detect infringements and poisoning, thereby underscoring the need for more robust detection approaches and establishing a public benchmark for advancement.

Abstract

The tremendous recent advances in generative artificial intelligence techniques have led to significant successes and promise in a wide range of different applications ranging from conversational agents and textual content generation to voice and visual synthesis. Amid the rise in generative AI and its increasing widespread adoption, there has been significant growing concern over the use of generative AI for malicious purposes. In the realm of visual content synthesis using generative AI, key areas of significant concern has been image forgery (e.g., generation of images containing or derived from copyright content), and data poisoning (i.e., generation of adversarially contaminated images). Motivated to address these key concerns to encourage responsible generative AI, we introduce the DeepfakeArt Challenge, a large-scale challenge benchmark dataset designed specifically to aid in the building of machine learning algorithms for generative AI art forgery and data poisoning detection. Comprising of over 32,000 records across a variety of generative forgery and data poisoning techniques, each entry consists of a pair of images that are either forgeries / adversarially contaminated or not. Each of the generated images in the DeepfakeArt Challenge benchmark dataset \footnote{The link to the dataset: http://anon\_for\_review.com} has been quality checked in a comprehensive manner.

DeepfakeArt Challenge: A Benchmark Dataset for Generative AI Art Forgery and Data Poisoning Detection

TL;DR

This work tackles the risk of copyright infringement and data poisoning in generative AI art by introducing the DeepfakeArt Challenge, a benchmark containing over 32,000 image pairs derived from Inpainting, Style Transfer, Adversarial Poisoning, and Cutmix on WikiArt sources. It formalizes two core problems—Art Forgery and Adversarial Data Poisoning—using region-based similarity criteria with transformations and small perturbations that can alter model outputs, respectively. The dataset is designed to emulate real-world copyright violations and adversarial contamination, with comprehensive quality checks. Empirical evaluation across several embedding models reveals high false negative rates, indicating current methods struggle to reliably detect infringements and poisoning, thereby underscoring the need for more robust detection approaches and establishing a public benchmark for advancement.

Abstract

The tremendous recent advances in generative artificial intelligence techniques have led to significant successes and promise in a wide range of different applications ranging from conversational agents and textual content generation to voice and visual synthesis. Amid the rise in generative AI and its increasing widespread adoption, there has been significant growing concern over the use of generative AI for malicious purposes. In the realm of visual content synthesis using generative AI, key areas of significant concern has been image forgery (e.g., generation of images containing or derived from copyright content), and data poisoning (i.e., generation of adversarially contaminated images). Motivated to address these key concerns to encourage responsible generative AI, we introduce the DeepfakeArt Challenge, a large-scale challenge benchmark dataset designed specifically to aid in the building of machine learning algorithms for generative AI art forgery and data poisoning detection. Comprising of over 32,000 records across a variety of generative forgery and data poisoning techniques, each entry consists of a pair of images that are either forgeries / adversarially contaminated or not. Each of the generated images in the DeepfakeArt Challenge benchmark dataset \footnote{The link to the dataset: http://anon\_for\_review.com} has been quality checked in a comprehensive manner.
Paper Structure (6 sections, 2 equations, 3 figures, 1 table)

This paper contains 6 sections, 2 equations, 3 figures, 1 table.

Figures (3)

  • Figure 1: Example images from the proposed DeepfakeArt Challenge dataset.
  • Figure 2: (left) Examples of various masks used for generating forgery pairs in inpainting category. (right) Example generated original-forgery image pairs in the style transfer category.
  • Figure 3: Distribution of data