Table of Contents
Fetching ...

GeNIe: Generative Hard Negative Images Through Diffusion

Soroush Abbasi Koohpayegani, Anuj Singh, K L Navaneet, Hamed Pirsiavash, Hadi Jamali-Rad

TL;DR

GeNIe is introduced, a novel augmentation method which leverages a latent diffusion model conditioned on a text prompt to combine two contrasting data points to generate challenging augmentations, resulting in a hard negative sample for the source category.

Abstract

Data augmentation is crucial in training deep models, preventing them from overfitting to limited data. Recent advances in generative AI, e.g., diffusion models, have enabled more sophisticated augmentation techniques that produce data resembling natural images. We introduce GeNIe a novel augmentation method which leverages a latent diffusion model conditioned on a text prompt to combine two contrasting data points (an image from the source category and a text prompt from the target category) to generate challenging augmentations. To achieve this, we adjust the noise level (equivalently, number of diffusion iterations) to ensure the generated image retains low-level and background features from the source image while representing the target category, resulting in a hard negative sample for the source category. We further automate and enhance GeNIe by adaptively adjusting the noise level selection on a per image basis (coined as GeNIe-Ada), leading to further performance improvements. Our extensive experiments, in both few-shot and long-tail distribution settings, demonstrate the effectiveness of our novel augmentation method and its superior performance over the prior art. Our code is available at: https://github.com/UCDvision/GeNIe

GeNIe: Generative Hard Negative Images Through Diffusion

TL;DR

GeNIe is introduced, a novel augmentation method which leverages a latent diffusion model conditioned on a text prompt to combine two contrasting data points to generate challenging augmentations, resulting in a hard negative sample for the source category.

Abstract

Data augmentation is crucial in training deep models, preventing them from overfitting to limited data. Recent advances in generative AI, e.g., diffusion models, have enabled more sophisticated augmentation techniques that produce data resembling natural images. We introduce GeNIe a novel augmentation method which leverages a latent diffusion model conditioned on a text prompt to combine two contrasting data points (an image from the source category and a text prompt from the target category) to generate challenging augmentations. To achieve this, we adjust the noise level (equivalently, number of diffusion iterations) to ensure the generated image retains low-level and background features from the source image while representing the target category, resulting in a hard negative sample for the source category. We further automate and enhance GeNIe by adaptively adjusting the noise level selection on a per image basis (coined as GeNIe-Ada), leading to further performance improvements. Our extensive experiments, in both few-shot and long-tail distribution settings, demonstrate the effectiveness of our novel augmentation method and its superior performance over the prior art. Our code is available at: https://github.com/UCDvision/GeNIe
Paper Structure (16 sections, 11 figures, 9 tables, 1 algorithm)

This paper contains 16 sections, 11 figures, 9 tables, 1 algorithm.

Figures (11)

  • Figure 1: Generative Hard Negative Images Through Diffusion (GeNIe): generates hard negative images that belong to the target category but are similar to the source image from low-level feature and contextual perspectives. GeNIe starts from a source image passing it through a partial noise addition process, and conditioning it on a different target category. By controlling the amount of noise, the reverse latent diffusion process generates images that serve as hard negatives for the source category.
  • Figure 2: Effect of noise ratio, $r$, in GeNIe: we employ GeNIe to generate augmentations for the target classes (motorcycle and cat) with varying $r$. Smaller $r$ yields images closely resembling the source semantics, creating an inconsistency with the intended target label. By tracing $r$ from $0$ to $1$, augmentations gradually transition from source image characteristics to the target category. However, a distinct shift from the source to the target occurs at a specific $r$ that may vary for different source images or target categories. For more examples, please refer to Fig. \ref{['fig:noise_ab_supp']}.
  • Figure 3: GeNIe-Ada
  • Figure 4: Visualization of Generative Samples: We compare GeNIe with two baselines: Img2Img$^{L}$ augmentation: both image and text prompt are from the same category. Adding noise does not change the image much, so they are not hard examples. Txt2Img augmentation: We simply use the text prompt only to generate an image for the desired category (e.g., using a text2image method). Such images may be far from the domain of our task since the generation is not informed by any visual data from our task. GeNIe augmentation: We use the target category name in the text prompt only along with the source image.
  • Figure 5: Embedding visualizations of generative augmentations: We pass all generative augmentations through DINOv2 ViT-G (serving as an oracle) to extract their corresponding embeddings and visualize them with PCA. As shown, the extent of semantic shifts varies based on both the source image and the target class.
  • ...and 6 more figures