Table of Contents
Fetching ...

CLOFAI: A Dataset of Real And Fake Image Classification Tasks for Continual Learning

William Doherty, Anton Lee, Heitor Murilo Gomes

TL;DR

CLOFAI introduces a domain-incremental benchmark for real-versus-fake image classification to evaluate continual learning as new generative models emerge. The authors benchmark three foundational methods—Elastic Weight Consolidation, Gradient Episodic Memory, and Experience Replay—finding that EWC underperforms while GEM and Experience Replay effectively mitigate forgetting, with pretraining providing substantial gains. The dataset uses five tasks, each with 5000 real CIFAR-10 images and 5000 fake images produced by distinct generative models, and is accompanied by code on GitHub for replication. Overall, CLOFAI demonstrates the viability of continual learning approaches in adapting to evolving generative-model landscapes and highlights GEM and ER as promising baselines for robust fake-image detectors.

Abstract

The rapid advancement of generative AI models capable of creating realistic media has led to a need for classifiers that can accurately distinguish between genuine and artificially-generated images. A significant challenge for these classifiers emerges when they encounter images from generative models that are not represented in their training data, usually resulting in diminished performance. A typical approach is to periodically update the classifier's training data with images from the new generative models then retrain the classifier on the updated dataset. However, in some real-life scenarios, storage, computational, or privacy constraints render this approach impractical. Additionally, models used in security applications may be required to rapidly adapt. In these circumstances, continual learning provides a promising alternative, as the classifier can be updated without retraining on the entire dataset. In this paper, we introduce a new dataset called CLOFAI (Continual Learning On Fake and Authentic Images), which takes the form of a domain-incremental image classification problem. Moreover, we showcase the applicability of this dataset as a benchmark for evaluating continual learning methodologies. In doing this, we set a baseline on our novel dataset using three foundational continual learning methods -- EWC, GEM, and Experience Replay -- and find that EWC performs poorly, while GEM and Experience Replay show promise, performing significantly better than a Naive baseline. The dataset and code to run the experiments can be accessed from the following GitHub repository: https://github.com/Will-Doherty/CLOFAI.

CLOFAI: A Dataset of Real And Fake Image Classification Tasks for Continual Learning

TL;DR

CLOFAI introduces a domain-incremental benchmark for real-versus-fake image classification to evaluate continual learning as new generative models emerge. The authors benchmark three foundational methods—Elastic Weight Consolidation, Gradient Episodic Memory, and Experience Replay—finding that EWC underperforms while GEM and Experience Replay effectively mitigate forgetting, with pretraining providing substantial gains. The dataset uses five tasks, each with 5000 real CIFAR-10 images and 5000 fake images produced by distinct generative models, and is accompanied by code on GitHub for replication. Overall, CLOFAI demonstrates the viability of continual learning approaches in adapting to evolving generative-model landscapes and highlights GEM and ER as promising baselines for robust fake-image detectors.

Abstract

The rapid advancement of generative AI models capable of creating realistic media has led to a need for classifiers that can accurately distinguish between genuine and artificially-generated images. A significant challenge for these classifiers emerges when they encounter images from generative models that are not represented in their training data, usually resulting in diminished performance. A typical approach is to periodically update the classifier's training data with images from the new generative models then retrain the classifier on the updated dataset. However, in some real-life scenarios, storage, computational, or privacy constraints render this approach impractical. Additionally, models used in security applications may be required to rapidly adapt. In these circumstances, continual learning provides a promising alternative, as the classifier can be updated without retraining on the entire dataset. In this paper, we introduce a new dataset called CLOFAI (Continual Learning On Fake and Authentic Images), which takes the form of a domain-incremental image classification problem. Moreover, we showcase the applicability of this dataset as a benchmark for evaluating continual learning methodologies. In doing this, we set a baseline on our novel dataset using three foundational continual learning methods -- EWC, GEM, and Experience Replay -- and find that EWC performs poorly, while GEM and Experience Replay show promise, performing significantly better than a Naive baseline. The dataset and code to run the experiments can be accessed from the following GitHub repository: https://github.com/Will-Doherty/CLOFAI.
Paper Structure (14 sections, 2 equations, 2 figures, 11 tables)

This paper contains 14 sections, 2 equations, 2 figures, 11 tables.

Figures (2)

  • Figure 1: Problem Setup
  • Figure 2: Example of a Fake Image for Each Task