Table of Contents
Fetching ...

Self-Supervised Data Generation for Precision Agriculture: Blending Simulated Environments with Real Imagery

Leonardo Saraceni, Ionut Marian Motoi, Daniele Nardi, Thomas Alessandro Ciarfuglia

TL;DR

The paper tackles the challenge of labeled-data scarcity and covariate shift in precision agriculture by blending real-world segmentation masks with photo-realistic synthetic vineyard imagery. It introduces a Unity-based CANOPIES vineyard simulator to generate synthetic data and uses real masks obtained via a YOLOv5-based detector and SAM, then pastes them onto synthetic images with PCA-guided alignment to create diverse, labeled samples. Empirical results on grape detection show that blending real instances into synthetic scenes (SyntheticPasted) and combining with pseudo-labeled real data yields the best performance, improving robustness to occlusions and illumination changes. The approach offers an automated, scalable data-generation pipeline suitable for adoption by farmers and adaptable to other crops.

Abstract

In precision agriculture, the scarcity of labeled data and significant covariate shifts pose unique challenges for training machine learning models. This scarcity is particularly problematic due to the dynamic nature of the environment and the evolving appearance of agricultural subjects as living things. We propose a novel system for generating realistic synthetic data to address these challenges. Utilizing a vineyard simulator based on the Unity engine, our system employs a cut-and-paste technique with geometrical consistency considerations to produce accurate photo-realistic images and labels from synthetic environments to train detection algorithms. This approach generates diverse data samples across various viewpoints and lighting conditions. We demonstrate considerable performance improvements in training a state-of-the-art detector by applying our method to table grapes cultivation. The combination of techniques can be easily automated, an increasingly important consideration for adoption in agricultural practice.

Self-Supervised Data Generation for Precision Agriculture: Blending Simulated Environments with Real Imagery

TL;DR

The paper tackles the challenge of labeled-data scarcity and covariate shift in precision agriculture by blending real-world segmentation masks with photo-realistic synthetic vineyard imagery. It introduces a Unity-based CANOPIES vineyard simulator to generate synthetic data and uses real masks obtained via a YOLOv5-based detector and SAM, then pastes them onto synthetic images with PCA-guided alignment to create diverse, labeled samples. Empirical results on grape detection show that blending real instances into synthetic scenes (SyntheticPasted) and combining with pseudo-labeled real data yields the best performance, improving robustness to occlusions and illumination changes. The approach offers an automated, scalable data-generation pipeline suitable for adoption by farmers and adaptable to other crops.

Abstract

In precision agriculture, the scarcity of labeled data and significant covariate shifts pose unique challenges for training machine learning models. This scarcity is particularly problematic due to the dynamic nature of the environment and the evolving appearance of agricultural subjects as living things. We propose a novel system for generating realistic synthetic data to address these challenges. Utilizing a vineyard simulator based on the Unity engine, our system employs a cut-and-paste technique with geometrical consistency considerations to produce accurate photo-realistic images and labels from synthetic environments to train detection algorithms. This approach generates diverse data samples across various viewpoints and lighting conditions. We demonstrate considerable performance improvements in training a state-of-the-art detector by applying our method to table grapes cultivation. The combination of techniques can be easily automated, an increasingly important consideration for adoption in agricultural practice.

Paper Structure

This paper contains 11 sections, 4 equations, 5 figures, 1 table.

Figures (5)

  • Figure 1: a) The adopted operational environment in Aprilia (Lazio). b) Robotic platform used in the EU Project CANOPIES placed in the simulated environment. c) Synthetic image of the grapes captured from the robot camera point of view.
  • Figure 2: The proposed pipeline is divided into two parts. 1) The base detector (Yolov5) extracts bounding boxes from real images captured in the vineyard. We use those as input prompts for SAM to extract the segmentation masks of single instances, which we save in a buffer $B_{real}$. 2) For every grape instance in the synthetic images, we randomly sample a real mask from $B_{real}$ and perform PCA to align them. Then, we rescale and translate the real instance to overlap and blend it with the synthetic one.
  • Figure 3: Example of automatic detection using the base detector trained using automatically generated pseudo-labels (a) and segmentation by SAM using the detection as input prompt (b).
  • Figure 4: Example of simulator image blended with real grape instances using the pasting method described in Section \ref{['sec::paste']}
  • Figure 5: Qualitative evaluation of the inference results of the YOLO nano models in frames extracted from different sequences used for the test using a confidence threshold of 0.25 and IoU of 0.3. On the first row is frame 39 extracted from the CloseUp1 sequence; on the second row, frame 3 from CloseUp2; and on the third row, frame 5 from the Overview2 sequence. Each column corresponds to a different set used for training. The model trained using only the Pseudo dataset (a) displays a very low recall with many false negatives due to its poor generalization capabilities. The model that uses the SyntheticPasted set (b) shows an improvement compared to the baseline (Pseudo). The best model is obtained by training using the SyntheticPasted + Pseudo dataset (c), showing superior capabilities in cases of occlusions, intense illumination, and large clusters in the foreground.