Table of Contents
Fetching ...

Inversion Circle Interpolation: Diffusion-based Image Augmentation for Data-scarce Classification

Yanghao Wang, Long Chen

TL;DR

Today’s diffusion-based DA methods cannot take account of both faithfulness and diversity, and it is argued that they cannot take account of both faithfulness and diversity, which are two critical keys for generating high-quality samples and boosting classification performance.

Abstract

Data Augmentation (DA), i.e., synthesizing faithful and diverse samples to expand the original training set, is a prevalent and effective strategy to improve the performance of various data-scarce tasks. With the powerful image generation ability, diffusion-based DA has shown strong performance gains on different image classification benchmarks. In this paper, we analyze today's diffusion-based DA methods, and argue that they cannot take account of both faithfulness and diversity, which are two critical keys for generating high-quality samples and boosting classification performance. To this end, we propose a novel Diffusion-based DA method: Diff-II. Specifically, it consists of three steps: 1) Category concepts learning: Learning concept embeddings for each category. 2) Inversion interpolation: Calculating the inversion for each image, and conducting circle interpolation for two randomly sampled inversions from the same category. 3) Two-stage denoising: Using different prompts to generate synthesized images in a coarse-to-fine manner. Extensive experiments on various data-scarce image classification tasks (e.g., few-shot, long-tailed, and out-of-distribution classification) have demonstrated its effectiveness over state-of-the-art diffusion-based DA methods.

Inversion Circle Interpolation: Diffusion-based Image Augmentation for Data-scarce Classification

TL;DR

Today’s diffusion-based DA methods cannot take account of both faithfulness and diversity, and it is argued that they cannot take account of both faithfulness and diversity, which are two critical keys for generating high-quality samples and boosting classification performance.

Abstract

Data Augmentation (DA), i.e., synthesizing faithful and diverse samples to expand the original training set, is a prevalent and effective strategy to improve the performance of various data-scarce tasks. With the powerful image generation ability, diffusion-based DA has shown strong performance gains on different image classification benchmarks. In this paper, we analyze today's diffusion-based DA methods, and argue that they cannot take account of both faithfulness and diversity, which are two critical keys for generating high-quality samples and boosting classification performance. To this end, we propose a novel Diffusion-based DA method: Diff-II. Specifically, it consists of three steps: 1) Category concepts learning: Learning concept embeddings for each category. 2) Inversion interpolation: Calculating the inversion for each image, and conducting circle interpolation for two randomly sampled inversions from the same category. 3) Two-stage denoising: Using different prompts to generate synthesized images in a coarse-to-fine manner. Extensive experiments on various data-scarce image classification tasks (e.g., few-shot, long-tailed, and out-of-distribution classification) have demonstrated its effectiveness over state-of-the-art diffusion-based DA methods.
Paper Structure (14 sections, 8 equations, 9 figures, 4 tables)

This paper contains 14 sections, 8 equations, 9 figures, 4 tables.

Figures (9)

  • Figure 1: Given training images, data augmentation aims to generate new faithful and diverse synthetic images. (a) These synthetic images are faithful but not diverse. (b) These synthetic images are diverse but not faithful. (c) These synthetic images are both faithful and diverse.
  • Figure 2: a) Intra-category DA: Given a reference image (from the original set), it adds some noise and denoises with a prompt containing the same category concept (e.g., concept "[A]" for category A image). (b) Inter-category DA: Different from Intra-category DA, it denoises with a prompt containing a different category concept (e.g., concept "[B]" for category A image). (c) Ours: It first calculates the inversion for each image, and conducts random circle interpolation for two images of the same category. Then, it denoises in a two-stage manner with different prompts.
  • Figure 3: Pipeline of Diff-II. (1) Concept Learning: Learning accurate concepts for each category. (2) Inversion Interpolation: Calculating DDIM inversion for each image conditioned on the learned concept. Then, randomly sampling a pair and conducting random circle interpolation. (3) Two-stage Denoising: Denosing the interpolation results in a two-stage manner with different prompts.
  • Figure 3: OOD classification. "L", "W" represent "land" and "water", respectively. Results are averaged on three trials.
  • Figure 4: An illustration for the proposed circle interpolation.
  • ...and 4 more figures