Table of Contents
Fetching ...

Semi-Self-Supervised Domain Adaptation: Developing Deep Learning Models with Limited Annotated Data for Wheat Head Segmentation

Alireza Ghanbari, Gholamhassan Shirdel, Farhad Maleki

TL;DR

This work tackles domain shift in wheat head segmentation under variable growth stages and environmental conditions by introducing a semi-self-supervised domain adaptation framework built on a probabilistic diffusion process. The method employs a dual-stream encoder–decoder that learns from synthetically annotated images and unannotated real video frames, enabling segmentation while adapting representations to real data. The best model achieves a Dice of 0.807 on an internal test and 0.648 on an external GWHD-based test, outperforming a recent baseline and showing reduced variance across 18 domains, which underscores improved generalization with minimal manual labeling. This approach promises practical impact for scalable, data-efficient DL deployment in precision agriculture across diverse field conditions.

Abstract

Precision agriculture involves the application of advanced technologies to improve agricultural productivity, efficiency, and profitability while minimizing waste and environmental impact. Deep learning approaches enable automated decision-making for many visual tasks. However, in the agricultural domain, variability in growth stages and environmental conditions, such as weather and lighting, presents significant challenges to developing deep learning-based techniques that generalize across different conditions. The resource-intensive nature of creating extensive annotated datasets that capture these variabilities further hinders the widespread adoption of these approaches. To tackle these issues, we introduce a semi-self-supervised domain adaptation technique based on deep convolutional neural networks with a probabilistic diffusion process, requiring minimal manual data annotation. Using only three manually annotated images and a selection of video clips from wheat fields, we generated a large-scale computationally annotated dataset of image-mask pairs and a large dataset of unannotated images extracted from video frames. We developed a two-branch convolutional encoder-decoder model architecture that uses both synthesized image-mask pairs and unannotated images, enabling effective adaptation to real images. The proposed model achieved a Dice score of 80.7\% on an internal test dataset and a Dice score of 64.8\% on an external test set, composed of images from five countries and spanning 18 domains, indicating its potential to develop generalizable solutions that could encourage the wider adoption of advanced technologies in agriculture.

Semi-Self-Supervised Domain Adaptation: Developing Deep Learning Models with Limited Annotated Data for Wheat Head Segmentation

TL;DR

This work tackles domain shift in wheat head segmentation under variable growth stages and environmental conditions by introducing a semi-self-supervised domain adaptation framework built on a probabilistic diffusion process. The method employs a dual-stream encoder–decoder that learns from synthetically annotated images and unannotated real video frames, enabling segmentation while adapting representations to real data. The best model achieves a Dice of 0.807 on an internal test and 0.648 on an external GWHD-based test, outperforming a recent baseline and showing reduced variance across 18 domains, which underscores improved generalization with minimal manual labeling. This approach promises practical impact for scalable, data-efficient DL deployment in precision agriculture across diverse field conditions.

Abstract

Precision agriculture involves the application of advanced technologies to improve agricultural productivity, efficiency, and profitability while minimizing waste and environmental impact. Deep learning approaches enable automated decision-making for many visual tasks. However, in the agricultural domain, variability in growth stages and environmental conditions, such as weather and lighting, presents significant challenges to developing deep learning-based techniques that generalize across different conditions. The resource-intensive nature of creating extensive annotated datasets that capture these variabilities further hinders the widespread adoption of these approaches. To tackle these issues, we introduce a semi-self-supervised domain adaptation technique based on deep convolutional neural networks with a probabilistic diffusion process, requiring minimal manual data annotation. Using only three manually annotated images and a selection of video clips from wheat fields, we generated a large-scale computationally annotated dataset of image-mask pairs and a large dataset of unannotated images extracted from video frames. We developed a two-branch convolutional encoder-decoder model architecture that uses both synthesized image-mask pairs and unannotated images, enabling effective adaptation to real images. The proposed model achieved a Dice score of 80.7\% on an internal test dataset and a Dice score of 64.8\% on an external test set, composed of images from five countries and spanning 18 domains, indicating its potential to develop generalizable solutions that could encourage the wider adoption of advanced technologies in agriculture.
Paper Structure (9 sections, 11 equations, 6 figures, 2 tables)

This paper contains 9 sections, 11 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: Three manually annotated image-mask pairs were utilized for data synthesis. We developed two training sets by synthesizing computationally annotated images using manually annotated images from the left ($I_{\eta}$) and the middle ($I_{\zeta}$), producing $8,000$ images based on $I_\eta$ and $8,000$ images based on $I_{\zeta}$. Hereafter, we refer to the $8,000$ images developed based on ($I_{\eta}$) as dataset $\mathbb{D}_{\eta}$. We refer to the set comprising the whole $16,000$ images as $\mathbb{D}_{\eta +\zeta}$. Additionally, we created a validation set by synthesizing $4,000$ images, with $2,000$ from the image on the right ($I_{\tau}$) and $2,000$ images based on $I_{\zeta}$. Hereafter, we refer to this set of $4,000$ images as $\mathbb{D}_{\zeta + \tau}$. Dataset $\mathbb{D}_{\zeta + \tau}$ was made to allow for a balanced representation of wheat field images from the early and late growth stages. All computationally annotated samples were synthesized following the methodology described by Najafian et al. najafian2023semi.
  • Figure 2: Examples of computationally synthesized images and their corresponding segmentation masks.
  • Figure 3: Schematic Representation of the Model Architecture. The encoder focuses on developing a joint image representation for both synthesized and real images, while the mask decoder aims at generating segmentation masks, and the image decoder aims at reconstructing the real images, forcing the encoder to adapt to the real images.
  • Figure 4: A ResNet block comprises three groups of operations, including convolution, GroupNorm layers, and the Swish activation function for nonlinearity. It also incorporates skip connections to enhance feature propagation.
  • Figure 5: Encoder model architecture is designed by combining convolutional layers, ResNet blocks, and GroupNorm layers. Also, in each of the two decoding streams, we utilize concatenation instead of addition.
  • ...and 1 more figures