Table of Contents
Fetching ...

Decoupled Data Augmentation for Improving Image Classification

Ruoxin Chen, Zhe Wang, Ke-Yue Zhang, Shuang Wu, Jiamu Sun, Shouli Wang, Taiping Yao, Shouhong Ding

TL;DR

Decoupled Data Augmentation (De-DA), which resolves the fidelity-diversity dilemma by separating images into CDPs and CIPs and handling them adaptively, and replaces the image's CIP with inter-class variants, creating diverse CDP-CIP combinations.

Abstract

Recent advancements in image mixing and generative data augmentation have shown promise in enhancing image classification. However, these techniques face the challenge of balancing semantic fidelity with diversity. Specifically, image mixing involves interpolating two images to create a new one, but this pixel-level interpolation can compromise fidelity. Generative augmentation uses text-to-image generative models to synthesize or modify images, often limiting diversity to avoid generating out-of-distribution data that potentially affects accuracy. We propose that this fidelity-diversity dilemma partially stems from the whole-image paradigm of existing methods. Since an image comprises the class-dependent part (CDP) and the class-independent part (CIP), where each part has fundamentally different impacts on the image's fidelity, treating different parts uniformly can therefore be misleading. To address this fidelity-diversity dilemma, we introduce Decoupled Data Augmentation (De-DA), which resolves the dilemma by separating images into CDPs and CIPs and handling them adaptively. To maintain fidelity, we use generative models to modify real CDPs under controlled conditions, preserving semantic consistency. To enhance diversity, we replace the image's CIP with inter-class variants, creating diverse CDP-CIP combinations. Additionally, we implement an online randomized combination strategy during training to generate numerous distinct CDP-CIP combinations cost-effectively. Comprehensive empirical evaluations validate the effectiveness of our method.

Decoupled Data Augmentation for Improving Image Classification

TL;DR

Decoupled Data Augmentation (De-DA), which resolves the fidelity-diversity dilemma by separating images into CDPs and CIPs and handling them adaptively, and replaces the image's CIP with inter-class variants, creating diverse CDP-CIP combinations.

Abstract

Recent advancements in image mixing and generative data augmentation have shown promise in enhancing image classification. However, these techniques face the challenge of balancing semantic fidelity with diversity. Specifically, image mixing involves interpolating two images to create a new one, but this pixel-level interpolation can compromise fidelity. Generative augmentation uses text-to-image generative models to synthesize or modify images, often limiting diversity to avoid generating out-of-distribution data that potentially affects accuracy. We propose that this fidelity-diversity dilemma partially stems from the whole-image paradigm of existing methods. Since an image comprises the class-dependent part (CDP) and the class-independent part (CIP), where each part has fundamentally different impacts on the image's fidelity, treating different parts uniformly can therefore be misleading. To address this fidelity-diversity dilemma, we introduce Decoupled Data Augmentation (De-DA), which resolves the dilemma by separating images into CDPs and CIPs and handling them adaptively. To maintain fidelity, we use generative models to modify real CDPs under controlled conditions, preserving semantic consistency. To enhance diversity, we replace the image's CIP with inter-class variants, creating diverse CDP-CIP combinations. Additionally, we implement an online randomized combination strategy during training to generate numerous distinct CDP-CIP combinations cost-effectively. Comprehensive empirical evaluations validate the effectiveness of our method.

Paper Structure

This paper contains 30 sections, 3 equations, 9 figures, 8 tables.

Figures (9)

  • Figure 1: Visualization of different data augmentation methods. Column 2-4: CutMix, MixUp, and SaliencyMix interpolate two images at the pixel level. Column 5-7: DiffuseMix, DA-Fusion, and Diff-Mix utilize generative models to make semantic modifications to the input image. Column 8: Our proposed method, De-DA, fuses image-mixing with generative data augmentation. De-DA edits the class-dependent part of one image using a generative model, then mixes it with another image's class-independent part to create a realistic and diverse image.
  • Figure 2: Illustration of the mechanisms of different data augmentation methods. Row 1: Image-mixing methods, such as Mixup mixup and CutMix cutmix create mixed images through pixel-level interpolation. DiffuseMix islam2024diffusemix uses style prompts (e.g., "Sunset") to transform input images, generating varied-style images which are then concatenated to form a hybrid image. DA-Fusion trabucco2024effective uses the intra-class identifier $\textcolor{deepgreen}{V1_{image}}$, while Diff-Mix wang2024diffmix employs an another class's identifier $\textcolor{deepgreen}{V2_{image}}$ to translate natural images with SDEdit, but these methods face issues of limited variety or constrained fidelity. Row 2: Our proposed De-DA maintains fidelity by editing CDPs conditioned with $\textcolor{deeporange}{V1_{CDP}}$ through a transparency image-to-image diffusion pipeline which is specifically designed for handling transparent images. It also enhances diversity by replacing CIPs and applying random transformations to CDPs, resulting in faithful and diverse images.
  • Figure 3: The pipeline of De-DA. Left: (1) Images are decoupled into CDPs and CIPs. Missing regions in each CIP are inpainted, creating a pool of inter-class inpainted CIPs. (2) Truncated-Timestep Textual Inversion (TTTI) are applied to the real CDPs to efficiently learn the class-specific identifiers $\textcolor{deeporange}{V_1}, \textcolor{deeporange}{V_2}, \dots$ for each class. These identifiers are then used to semantically modify real CDPs into new synthetic CDPs. Right: (3) CDPs and CIPs are combined by pairing a real or synthetic CDP (with probability $1-p_{\text{syn}}$ or $p_{\text{syn}}$) with a randomly selected CIP to create a new image. With probability $p_{mix}$, an inter-class CDP is added to generate a mixed-CDPs image.
  • Figure 4: Examples illustrate the differences between applying SDEdit to the entire images and the pure CDPs. We observe that in generative methods, the background can negatively affect the performance of SDEdit. Row 1: The generative model misinterprets a person in red clothing in the background as the bird's crest. Row 2: A person in the background is mistakenly integrated into the bird during SDEdit, reducing the fidelity of the translated image. Row 3: Ice in the background is misrepresented as birds. In contrast, the right three columns showcase images generated by De-DA. De-DA involves applying textual inversion and SDEdit to isolated CDPs, facilitating modifications to avian features like feathers, eyes, and legs without altering their labels. By focusing on isolated CDPs, De-DA effectively mitigates the influence of background noise on the translation process.
  • Figure 5: (a) Comparison on multi-label classification. (b) Comparison on diversity by PSNR.
  • ...and 4 more figures