Table of Contents
Fetching ...

Data Augmentation for Image Classification using Generative AI

Fazle Rahat, M Shifat Hossain, Md Rubel Ahmed, Sumit Kumar Jha, Rickard Ewetz

TL;DR

We address data scarcity and domain shift in fine-grained image recognition by proposing Automated Generative Data Augmentation (AGA), a segmentation-guided, language-informed augmentation framework. AGA isolates the subject with SAM and GroundingDINO, generates diverse background captions via a prompt-engineered LLM, synthesizes backgrounds with diffusion models, and merges them with affine-transformed subjects to preserve foreground integrity. The approach yields substantial gains on ImageNet-derived tasks ($$15.6\%$$ in-distribution and $$23.5\%$$ out-of-distribution improvements) and a notable $$64.3\%$$ SIC enhancement, while also improving explainability through Grad-CAM analyses. These results demonstrate improved fine-grained classification and generalization in low-data regimes, with potential for broader application, though subject-background compatibility remains a limitation to address in future work.

Abstract

Scaling laws dictate that the performance of AI models is proportional to the amount of available data. Data augmentation is a promising solution to expanding the dataset size. Traditional approaches focused on augmentation using rotation, translation, and resizing. Recent approaches use generative AI models to improve dataset diversity. However, the generative methods struggle with issues such as subject corruption and the introduction of irrelevant artifacts. In this paper, we propose the Automated Generative Data Augmentation (AGA). The framework combines the utility of large language models (LLMs), diffusion models, and segmentation models to augment data. AGA preserves foreground authenticity while ensuring background diversity. Specific contributions include: i) segment and superclass based object extraction, ii) prompt diversity with combinatorial complexity using prompt decomposition, and iii) affine subject manipulation. We evaluate AGA against state-of-the-art (SOTA) techniques on three representative datasets, ImageNet, CUB, and iWildCam. The experimental evaluation demonstrates an accuracy improvement of 15.6% and 23.5% for in and out-of-distribution data compared to baseline models, respectively. There is also a 64.3% improvement in SIC score compared to the baselines.

Data Augmentation for Image Classification using Generative AI

TL;DR

We address data scarcity and domain shift in fine-grained image recognition by proposing Automated Generative Data Augmentation (AGA), a segmentation-guided, language-informed augmentation framework. AGA isolates the subject with SAM and GroundingDINO, generates diverse background captions via a prompt-engineered LLM, synthesizes backgrounds with diffusion models, and merges them with affine-transformed subjects to preserve foreground integrity. The approach yields substantial gains on ImageNet-derived tasks ( in-distribution and out-of-distribution improvements) and a notable SIC enhancement, while also improving explainability through Grad-CAM analyses. These results demonstrate improved fine-grained classification and generalization in low-data regimes, with potential for broader application, though subject-background compatibility remains a limitation to address in future work.

Abstract

Scaling laws dictate that the performance of AI models is proportional to the amount of available data. Data augmentation is a promising solution to expanding the dataset size. Traditional approaches focused on augmentation using rotation, translation, and resizing. Recent approaches use generative AI models to improve dataset diversity. However, the generative methods struggle with issues such as subject corruption and the introduction of irrelevant artifacts. In this paper, we propose the Automated Generative Data Augmentation (AGA). The framework combines the utility of large language models (LLMs), diffusion models, and segmentation models to augment data. AGA preserves foreground authenticity while ensuring background diversity. Specific contributions include: i) segment and superclass based object extraction, ii) prompt diversity with combinatorial complexity using prompt decomposition, and iii) affine subject manipulation. We evaluate AGA against state-of-the-art (SOTA) techniques on three representative datasets, ImageNet, CUB, and iWildCam. The experimental evaluation demonstrates an accuracy improvement of 15.6% and 23.5% for in and out-of-distribution data compared to baseline models, respectively. There is also a 64.3% improvement in SIC score compared to the baselines.
Paper Structure (21 sections, 15 figures, 4 tables)

This paper contains 21 sections, 15 figures, 4 tables.

Figures (15)

  • Figure 1: Example augmentation using text-to-image, image-to-image, inpainting, and our approach on ImageNet10. Images generated by text-to-image and image-to-image significantly lose foreground information. Inpainting provides comparatively better results but corrupts the foreground with unnecessary modifications. AGA is able to produce diverse background images while keeping the foreground information grounded with original images.
  • Figure 2: The methodology of the AGA framework. The inputs are an image and originals class name, while the outputs are corresponding augmented images. Subject isolation from input is performed by masked image generation. The domain captions generation engine generates diverse background prompts, which are utilized by stable diffusion to generate background images. Finally, these background images and isolated subjects are combined to generate augmented images.
  • Figure 3: Masked image generation process diagram
  • Figure 4: Prompt generation for background diversity.
  • Figure 5: Merging image mask with the generated backgrounds while utilizing affine transformations.
  • ...and 10 more figures