Table of Contents
Fetching ...

DiffuseMix: Label-Preserving Data Augmentation with Diffusion Models

Khawar Islam, Muhammad Zaigham Zaheer, Arif Mahmood, Karthik Nandakumar

TL;DR

DiffuseMix introduces a label-preserving data augmentation pipeline that combines diffusion-model generation with original imagery via concatenation and adds fractal blending to maximize structural diversity. By using a curated set of conditional prompts, it generates semantically aligned hybrids $H_{iju}$ and final augmented images $A_{ijuv}$, mitigating label ambiguity common to other image-mixing methods. Across seven datasets and multiple tasks, DiffuseMix yields consistent improvements in general and fine-grained classification, data-scarcity scenarios, transfer learning, and adversarial robustness, while maintaining compatibility with existing augmentation strategies. The approach demonstrates practical impact by enhancing generalization and robustness with a manageable augmentation overhead, supported by comprehensive ablations and supplementary results, and is complemented by a dedicated fractal dataset to further diversify structure.

Abstract

Recently, a number of image-mixing-based augmentation techniques have been introduced to improve the generalization of deep neural networks. In these techniques, two or more randomly selected natural images are mixed together to generate an augmented image. Such methods may not only omit important portions of the input images but also introduce label ambiguities by mixing images across labels resulting in misleading supervisory signals. To address these limitations, we propose DiffuseMix, a novel data augmentation technique that leverages a diffusion model to reshape training images, supervised by our bespoke conditional prompts. First, concatenation of a partial natural image and its generated counterpart is obtained which helps in avoiding the generation of unrealistic images or label ambiguities. Then, to enhance resilience against adversarial attacks and improves safety measures, a randomly selected structural pattern from a set of fractal images is blended into the concatenated image to form the final augmented image for training. Our empirical results on seven different datasets reveal that DiffuseMix achieves superior performance compared to existing state-of the-art methods on tasks including general classification,fine-grained classification, fine-tuning, data scarcity, and adversarial robustness. Augmented datasets and codes are available here: https://diffusemix.github.io/

DiffuseMix: Label-Preserving Data Augmentation with Diffusion Models

TL;DR

DiffuseMix introduces a label-preserving data augmentation pipeline that combines diffusion-model generation with original imagery via concatenation and adds fractal blending to maximize structural diversity. By using a curated set of conditional prompts, it generates semantically aligned hybrids and final augmented images , mitigating label ambiguity common to other image-mixing methods. Across seven datasets and multiple tasks, DiffuseMix yields consistent improvements in general and fine-grained classification, data-scarcity scenarios, transfer learning, and adversarial robustness, while maintaining compatibility with existing augmentation strategies. The approach demonstrates practical impact by enhancing generalization and robustness with a manageable augmentation overhead, supported by comprehensive ablations and supplementary results, and is complemented by a dedicated fractal dataset to further diversify structure.

Abstract

Recently, a number of image-mixing-based augmentation techniques have been introduced to improve the generalization of deep neural networks. In these techniques, two or more randomly selected natural images are mixed together to generate an augmented image. Such methods may not only omit important portions of the input images but also introduce label ambiguities by mixing images across labels resulting in misleading supervisory signals. To address these limitations, we propose DiffuseMix, a novel data augmentation technique that leverages a diffusion model to reshape training images, supervised by our bespoke conditional prompts. First, concatenation of a partial natural image and its generated counterpart is obtained which helps in avoiding the generation of unrealistic images or label ambiguities. Then, to enhance resilience against adversarial attacks and improves safety measures, a randomly selected structural pattern from a set of fractal images is blended into the concatenated image to form the final augmented image for training. Our empirical results on seven different datasets reveal that DiffuseMix achieves superior performance compared to existing state-of the-art methods on tasks including general classification,fine-grained classification, fine-tuning, data scarcity, and adversarial robustness. Augmented datasets and codes are available here: https://diffusemix.github.io/
Paper Structure (28 sections, 4 equations, 12 figures, 18 tables, 1 algorithm)

This paper contains 28 sections, 4 equations, 12 figures, 18 tables, 1 algorithm.

Figures (12)

  • Figure 1: Architecture of the proposed DiffuseMix approach. An input image and a randomly selected prompt are input to a diffusion model to obtain a generated image. Input and generated images are concatenated using a binary mask to obtain a hybrid image. A random fractal image is finally blended with this hybrid image to obtain the augmented image.
  • Figure 2: A set of bespoke conditional prompts are used to obtain generated images preserving important features and adding rich visual appearance to the input images.
  • Figure 3: Example images from different stages of DiffuseMix: input image ($I_i$), generated image ($\hat{I}_{ij}$), mask ($M_u$), hybrid image ($H_{iju}$), fractal image ($F_v$), and final augmented image ($A_{ijuv}$).
  • Figure 4: Augmentation overhead (+%) - accuracy (%) plot on CUB-200-2011 dataset with batch size $32$.
  • Figure 5: First row: original training image samples from different datasets such as Oxford-102 Flower nilsback2008automated, Stanford Cars krause20133d, and Aircraft maji2013fine, CUB-200-2011, and CIFAR100. Second row: Corresponding generated images show that the usage of descriptive prompts (blue text) results in poor images not feasible for training. When generating images on the CIFAR100 dataset, several additional challenges may occur due to the small size of the images. For example, the image in the last column taken from CIFAR-100 with its corresponding prompt results in a black image containing no visible output.
  • ...and 7 more figures