Semi-Self-Supervised Domain Adaptation: Developing Deep Learning Models with Limited Annotated Data for Wheat Head Segmentation
Alireza Ghanbari, Gholamhassan Shirdel, Farhad Maleki
TL;DR
This work tackles domain shift in wheat head segmentation under variable growth stages and environmental conditions by introducing a semi-self-supervised domain adaptation framework built on a probabilistic diffusion process. The method employs a dual-stream encoder–decoder that learns from synthetically annotated images and unannotated real video frames, enabling segmentation while adapting representations to real data. The best model achieves a Dice of 0.807 on an internal test and 0.648 on an external GWHD-based test, outperforming a recent baseline and showing reduced variance across 18 domains, which underscores improved generalization with minimal manual labeling. This approach promises practical impact for scalable, data-efficient DL deployment in precision agriculture across diverse field conditions.
Abstract
Precision agriculture involves the application of advanced technologies to improve agricultural productivity, efficiency, and profitability while minimizing waste and environmental impact. Deep learning approaches enable automated decision-making for many visual tasks. However, in the agricultural domain, variability in growth stages and environmental conditions, such as weather and lighting, presents significant challenges to developing deep learning-based techniques that generalize across different conditions. The resource-intensive nature of creating extensive annotated datasets that capture these variabilities further hinders the widespread adoption of these approaches. To tackle these issues, we introduce a semi-self-supervised domain adaptation technique based on deep convolutional neural networks with a probabilistic diffusion process, requiring minimal manual data annotation. Using only three manually annotated images and a selection of video clips from wheat fields, we generated a large-scale computationally annotated dataset of image-mask pairs and a large dataset of unannotated images extracted from video frames. We developed a two-branch convolutional encoder-decoder model architecture that uses both synthesized image-mask pairs and unannotated images, enabling effective adaptation to real images. The proposed model achieved a Dice score of 80.7\% on an internal test dataset and a Dice score of 64.8\% on an external test set, composed of images from five countries and spanning 18 domains, indicating its potential to develop generalizable solutions that could encourage the wider adoption of advanced technologies in agriculture.
