Table of Contents
Fetching ...

SynthSet: Generative Diffusion Model for Semantic Segmentation in Precision Agriculture

Andrew Heschl, Mauricio Murillo, Keyhan Najafian, Farhad Maleki

TL;DR

The results show the efficacy of the proposed methodology for addressing data scarcity for semantic segmentation tasks within the precision agriculture domain and can be readily adapted for various segmentation tasks in precision agriculture and beyond.

Abstract

This paper introduces a methodology for generating synthetic annotated data to address data scarcity in semantic segmentation tasks within the precision agriculture domain. Utilizing Denoising Diffusion Probabilistic Models (DDPMs) and Generative Adversarial Networks (GANs), we propose a dual diffusion model architecture for synthesizing realistic annotated agricultural data, without any human intervention. We employ super-resolution to enhance the phenotypic characteristics of the synthesized images and their coherence with the corresponding generated masks. We showcase the utility of the proposed method for wheat head segmentation. The high quality of synthesized data underscores the effectiveness of the proposed methodology in generating image-mask pairs. Furthermore, models trained on our generated data exhibit promising performance when tested on an external, diverse dataset of real wheat fields. The results show the efficacy of the proposed methodology for addressing data scarcity for semantic segmentation tasks. Moreover, the proposed approach can be readily adapted for various segmentation tasks in precision agriculture and beyond.

SynthSet: Generative Diffusion Model for Semantic Segmentation in Precision Agriculture

TL;DR

The results show the efficacy of the proposed methodology for addressing data scarcity for semantic segmentation tasks within the precision agriculture domain and can be readily adapted for various segmentation tasks in precision agriculture and beyond.

Abstract

This paper introduces a methodology for generating synthetic annotated data to address data scarcity in semantic segmentation tasks within the precision agriculture domain. Utilizing Denoising Diffusion Probabilistic Models (DDPMs) and Generative Adversarial Networks (GANs), we propose a dual diffusion model architecture for synthesizing realistic annotated agricultural data, without any human intervention. We employ super-resolution to enhance the phenotypic characteristics of the synthesized images and their coherence with the corresponding generated masks. We showcase the utility of the proposed method for wheat head segmentation. The high quality of synthesized data underscores the effectiveness of the proposed methodology in generating image-mask pairs. Furthermore, models trained on our generated data exhibit promising performance when tested on an external, diverse dataset of real wheat fields. The results show the efficacy of the proposed methodology for addressing data scarcity for semantic segmentation tasks. Moreover, the proposed approach can be readily adapted for various segmentation tasks in precision agriculture and beyond.

Paper Structure

This paper contains 9 sections, 1 equation, 10 figures, 3 tables.

Figures (10)

  • Figure 1: Data samples used for training, and their corresponding pseudo labels.
  • Figure 2: Our proposed pipeline, displayed from top to bottom: SharedEncoder approach, TwoEncoder approach, Concat approach, and GAN approach. In the GAN approach, Paired Diffusion can be implemented using either SharedEncoder, TwoEncoder, or Concat.
  • Figure 3: Super-resolution pipeline. High-resolution images are of size $256\times256$, while the low-resolutions are $128\times128$.
  • Figure 4: Images generated by different variations of our model architecture. The top row displays the images, the center row shows the masks, and the bottom row showcases the images overlaid by their corresponding masks. The columns display SynthSet variations in the following order: (A) Concat, (B) Concat with Discriminator, (C) TwoEncoder, (D) TwoEncoder with Discriminator, and (E) SharedEncoder. Although each method generates realistic images, adding the discriminator to each variation further enhances the depth and realism of the generated wheat images.
  • Figure 5: Images and their masks split left and right. Left is bilinear upsampling for images and the nearest neighbor for masks. Right is the result of our super-resolution diffusion model.
  • ...and 5 more figures