Table of Contents
Fetching ...

Modified CycleGAN for the synthesization of samples for wheat head segmentation

Jaden Myers, Keyhan Najafian, Farhad Maleki, Katie Ovens

TL;DR

The paper tackles the challenge of scarce annotated data for crop-segmentation by generating a large synthetic dataset and bridging the domain gap to real images with a segmentation-aware CycleGAN. It introduces a modified CycleGAN that takes segmentation masks as input to preserve semantic information during translation, producing a translated dataset $\hat{R}$ from synthetic $S$ that closely resembles real imagery. A U-Net-based segmentation model trained on $\hat{R}$ achieves substantial performance gains across internal and external GWHD datasets, with further improvements from a pseudo-labeling fine-tuning step. The approach demonstrates strong potential for scalable, domain-adaptive semantic segmentation in agriculture and could generalize to other crops and densely patterned imagery.

Abstract

Deep learning models have been used for a variety of image processing tasks. However, most of these models are developed through supervised learning approaches, which rely heavily on the availability of large-scale annotated datasets. Developing such datasets is tedious and expensive. In the absence of an annotated dataset, synthetic data can be used for model development; however, due to the substantial differences between simulated and real data, a phenomenon referred to as domain gap, the resulting models often underperform when applied to real data. In this research, we aim to address this challenge by first computationally simulating a large-scale annotated dataset and then using a generative adversarial network (GAN) to fill the gap between simulated and real images. This approach results in a synthetic dataset that can be effectively utilized to train a deep-learning model. Using this approach, we developed a realistic annotated synthetic dataset for wheat head segmentation. This dataset was then used to develop a deep-learning model for semantic segmentation. The resulting model achieved a Dice score of 83.4\% on an internal dataset and Dice scores of 79.6% and 83.6% on two external Global Wheat Head Detection datasets. While we proposed this approach in the context of wheat head segmentation, it can be generalized to other crop types or, more broadly, to images with dense, repeated patterns such as those found in cellular imagery.

Modified CycleGAN for the synthesization of samples for wheat head segmentation

TL;DR

The paper tackles the challenge of scarce annotated data for crop-segmentation by generating a large synthetic dataset and bridging the domain gap to real images with a segmentation-aware CycleGAN. It introduces a modified CycleGAN that takes segmentation masks as input to preserve semantic information during translation, producing a translated dataset from synthetic that closely resembles real imagery. A U-Net-based segmentation model trained on achieves substantial performance gains across internal and external GWHD datasets, with further improvements from a pseudo-labeling fine-tuning step. The approach demonstrates strong potential for scalable, domain-adaptive semantic segmentation in agriculture and could generalize to other crops and densely patterned imagery.

Abstract

Deep learning models have been used for a variety of image processing tasks. However, most of these models are developed through supervised learning approaches, which rely heavily on the availability of large-scale annotated datasets. Developing such datasets is tedious and expensive. In the absence of an annotated dataset, synthetic data can be used for model development; however, due to the substantial differences between simulated and real data, a phenomenon referred to as domain gap, the resulting models often underperform when applied to real data. In this research, we aim to address this challenge by first computationally simulating a large-scale annotated dataset and then using a generative adversarial network (GAN) to fill the gap between simulated and real images. This approach results in a synthetic dataset that can be effectively utilized to train a deep-learning model. Using this approach, we developed a realistic annotated synthetic dataset for wheat head segmentation. This dataset was then used to develop a deep-learning model for semantic segmentation. The resulting model achieved a Dice score of 83.4\% on an internal dataset and Dice scores of 79.6% and 83.6% on two external Global Wheat Head Detection datasets. While we proposed this approach in the context of wheat head segmentation, it can be generalized to other crop types or, more broadly, to images with dense, repeated patterns such as those found in cellular imagery.
Paper Structure (12 sections, 6 figures, 1 table)

This paper contains 12 sections, 6 figures, 1 table.

Figures (6)

  • Figure 1: A visualization of the pipeline used to generate synthetic images. Wheat head cutouts are extracted from a manually annotated real wheat image and background frames are extracted from the background videos. The wheat heads are then randomly overlaid onto background frames to generate a wheat head image and a semantic segmentation mask.
  • Figure 2: Diagram of the modified CycleGAN. The generator $G_{S \rightarrow R} :S \rightarrow R$ takes as input synthetic images concatenated with their semantic segmentation masks $x \in S$ and outputs a corresponding real image $G_{S \rightarrow R}(x) \in \hat{R}$. A cycle consistency loss is calculated between $x$ and $G_{R \rightarrow S}(G_{S \rightarrow R}(x))$. Not present in the diagram, a cycle consistency loss is also calculated in the opposite direction with real images $y \in R$ and $G_{S \rightarrow R}(G_{R \rightarrow S}(y))$, and the discriminator $D_{S}$ calculates an adversarial loss with $G_{R \rightarrow S}(Y) \in S$ and $x$.
  • Figure 3: Examples of randomly selected GWHD images and the corresponding pseudo mask predictions. The top row consists of samples that were selected to be part of the dataset used to fine-tune the model. The bottom row consists of samples that were not selected.
  • Figure 4: In the boxes are randomly selected synthetic images on the left and the corresponding outputs from our modified CycleGAN on the right. The images on the far right are randomly selected real wheat images for comparison.
  • Figure 5: Synthetic wheat image and the corresponding output from an unmodified CycleGAN. The red circles highlight the flaws of the unmodified CycleGAN image translation.
  • ...and 1 more figures