Table of Contents
Fetching ...

Image Augmentation with Controlled Diffusion for Weakly-Supervised Semantic Segmentation

Wangyu Wu, Tianhong Dai, Xiaowei Huang, Fei Ma, Jimin Xiao

TL;DR

This paper introduces a novel approach called Image Augmentation with Controlled Diffusion (IACD), which effectively augments existing labeled datasets by generating diverse images through controlled diffusion, where the available images and image-level labels are served as the controlling information.

Abstract

Weakly-supervised semantic segmentation (WSSS), which aims to train segmentation models solely using image-level labels, has achieved significant attention. Existing methods primarily focus on generating high-quality pseudo labels using available images and their image-level labels. However, the quality of pseudo labels degrades significantly when the size of available dataset is limited. Thus, in this paper, we tackle this problem from a different view by introducing a novel approach called Image Augmentation with Controlled Diffusion (IACD). This framework effectively augments existing labeled datasets by generating diverse images through controlled diffusion, where the available images and image-level labels are served as the controlling information. Moreover, we also propose a high-quality image selection strategy to mitigate the potential noise introduced by the randomness of diffusion models. In the experiments, our proposed IACD approach clearly surpasses existing state-of-the-art methods. This effect is more obvious when the amount of available data is small, demonstrating the effectiveness of our method.

Image Augmentation with Controlled Diffusion for Weakly-Supervised Semantic Segmentation

TL;DR

This paper introduces a novel approach called Image Augmentation with Controlled Diffusion (IACD), which effectively augments existing labeled datasets by generating diverse images through controlled diffusion, where the available images and image-level labels are served as the controlling information.

Abstract

Weakly-supervised semantic segmentation (WSSS), which aims to train segmentation models solely using image-level labels, has achieved significant attention. Existing methods primarily focus on generating high-quality pseudo labels using available images and their image-level labels. However, the quality of pseudo labels degrades significantly when the size of available dataset is limited. Thus, in this paper, we tackle this problem from a different view by introducing a novel approach called Image Augmentation with Controlled Diffusion (IACD). This framework effectively augments existing labeled datasets by generating diverse images through controlled diffusion, where the available images and image-level labels are served as the controlling information. Moreover, we also propose a high-quality image selection strategy to mitigate the potential noise introduced by the randomness of diffusion models. In the experiments, our proposed IACD approach clearly surpasses existing state-of-the-art methods. This effect is more obvious when the amount of available data is small, demonstrating the effectiveness of our method.
Paper Structure (11 sections, 3 equations, 4 figures, 3 tables, 2 algorithms)

This paper contains 11 sections, 3 equations, 4 figures, 3 tables, 2 algorithms.

Figures (4)

  • Figure 1: (a) In the previous method, only images from the original dataset are used for training. (b) Our proposed IACD utilizes an diffusion model to generate synthetic images. Then, an image selection module is used to annotate and select the high-quality synthetic images to augment the original dataset for training.
  • Figure 2: The pipeline of our IACD. The airplane in the showcase is using the diffusion model with prompts to generate candidate images. Subsequently, the candidate images are filtered through an image selection process to ensure that only high-quality images are used as training data for the downstream WSSS.
  • Figure 3: The overall framework of IACD consists of several steps. Firstly, IACD utilizes controlled diffusion to generate entirely different images. Subsequently, the original image is processed using the Vision Transformer (ViT) as an encoder to generate patch embeddings, and a patch-driven classifier is trained for image categorization. Then, the generated diffusion images are passed through the same trained image classifier to select a high-quality image set. Moreover, the selected image set, along with the original images and their corresponding labels, is passed to the downstream WSSS task.
  • Figure 4: The comparison of qualitative segmentation results with ViT-PCM dosovitskiy2020image.