Table of Contents
Fetching ...

CIA: Controllable Image Augmentation Framework Based on Stable Diffusion

Mohamed Benkedadra, Dany Rimez, Tiffanie Godelaine, Natarajan Chidambaram, Hamed Razavi Khosroshahi, Horacio Tellez, Matei Mancas, Benoit Macq, Sidi Ahmed Mahmoudi

TL;DR

This work presents CIA, a modular pipeline, for generating synthetic images for dataset augmentation using Stable Diffusion, and forcing the existence of specific patterns in generated images using accurate prompting and ControlNet.

Abstract

Computer vision tasks such as object detection and segmentation rely on the availability of extensive, accurately annotated datasets. In this work, We present CIA, a modular pipeline, for (1) generating synthetic images for dataset augmentation using Stable Diffusion, (2) filtering out low quality samples using defined quality metrics, (3) forcing the existence of specific patterns in generated images using accurate prompting and ControlNet. In order to show how CIA can be used to search for an optimal augmentation pipeline of training data, we study human object detection in a data constrained scenario, using YOLOv8n on COCO and Flickr30k datasets. We have recorded significant improvement using CIA-generated images, approaching the performances obtained when doubling the amount of real images in the dataset. Our findings suggest that our modular framework can significantly enhance object detection systems, and make it possible for future research to be done on data-constrained scenarios. The framework is available at: github.com/multitel-ai/CIA.

CIA: Controllable Image Augmentation Framework Based on Stable Diffusion

TL;DR

This work presents CIA, a modular pipeline, for generating synthetic images for dataset augmentation using Stable Diffusion, and forcing the existence of specific patterns in generated images using accurate prompting and ControlNet.

Abstract

Computer vision tasks such as object detection and segmentation rely on the availability of extensive, accurately annotated datasets. In this work, We present CIA, a modular pipeline, for (1) generating synthetic images for dataset augmentation using Stable Diffusion, (2) filtering out low quality samples using defined quality metrics, (3) forcing the existence of specific patterns in generated images using accurate prompting and ControlNet. In order to show how CIA can be used to search for an optimal augmentation pipeline of training data, we study human object detection in a data constrained scenario, using YOLOv8n on COCO and Flickr30k datasets. We have recorded significant improvement using CIA-generated images, approaching the performances obtained when doubling the amount of real images in the dataset. Our findings suggest that our modular framework can significantly enhance object detection systems, and make it possible for future research to be done on data-constrained scenarios. The framework is available at: github.com/multitel-ai/CIA.

Paper Structure

This paper contains 26 sections, 6 figures.

Figures (6)

  • Figure 1: CIA-generated images from an image taken from the COCO dataset for different ControlNets. Either efficient (Openpose, Canny Edge) or inefficient (Mediapipe) for an object-detection task. Prediction of YOLOv8n trained on the dataset corresponding to the image is shown in red, and ground truth in green.
  • Figure 2: The CIA Framework for improving object detection accuracy through data augmentation using Stable Diffusion and ControlNet. Real images are taken from the COCO dataset. Notations used in the figure are further explained in the text.
  • Figure 3: Examples of synthetic images generated with ControlNets Segmentation and False-Segmentation from the same real image as in Fig.\ref{['fig:SDCN-examples']}. Left: YOLOv8m-seg's segmentation mask of the real image (top) and synthetic image generated (bottom). Right: synthetic image generated using the transposed segmentation mask.
  • Figure 4: Performance Evaluation of the trained YOLOv8 models on test set. Influence of 5 ControlNets (Canny Edge, OpenPose, MediaPipe, Segmentation and False-Segmentation) (a) on COCO dataset (b) on Flickr dataset. Evaluation of gain using synthetic images in addition to data augmentation on COCO dataset (c) medium (d) high.
  • Figure 5: Performance Evaluation of the trained YOLOv8 models on test set. Influence of sampling methods (ClipIQA, NIMA, BRISQUE, CORE-SET, confidence) on COCO dataset for ControlNet (a) Canny Edge (b) MediaPipe. "random" sampling refers to plots (a) and (b) of Fig.\ref{['fig:results_1']} for which the synthetic images are selected randomly.
  • ...and 1 more figures