Table of Contents
Fetching ...

Synthetic Crop-Weed Image Generation and its Impact on Model Generalization

Garen Boyadjian, Cyrille Pierre, Johann Laconte, Riccardo Bertoglio

TL;DR

This paper tackles the data bottleneck in crop–weed semantic segmentation for agricultural robots by using a Blender-CropCraft pipeline to procedurally generate diverse, annotated synthetic images. It systematically evaluates multiple segmentation architectures and analyzes cross-domain generalization across synthetic and real datasets, reporting a sim-to-real gap of roughly 10% that is smaller than prior methods. A key finding is that synthetic data substantially improves cross-domain generalization, and even small amounts of real data can boost performance when combined with synthetic data, though real data alone may outperform synthetic data for within-domain tasks. The work demonstrates the practicality of synthetic datasets for scalable training in agriculture and suggests hybrid strategies and domain adaptation techniques to further close the gap to real-world performance.

Abstract

Precise semantic segmentation of crops and weeds is necessary for agricultural weeding robots. However, training deep learning models requires large annotated datasets, which are costly to obtain in real fields. Synthetic data can reduce this burden, but the gap between simulated and real images remains a challenge. In this paper, we present a pipeline for procedural generation of synthetic crop-weed images using Blender, producing annotated datasets under diverse conditions of plant growth, weed density, lighting, and camera angle. We benchmark several state-of-the-art segmentation models on synthetic and real datasets and analyze their cross-domain generalization. Our results show that training on synthetic images leads to a sim-to-real gap of 10%, surpassing previous state-of-the-art methods. Moreover, synthetic data demonstrates good generalization properties, outperforming real datasets in cross-domain scenarios. These findings highlight the potential of synthetic agricultural datasets and support hybrid strategies for more efficient model training.

Synthetic Crop-Weed Image Generation and its Impact on Model Generalization

TL;DR

This paper tackles the data bottleneck in crop–weed semantic segmentation for agricultural robots by using a Blender-CropCraft pipeline to procedurally generate diverse, annotated synthetic images. It systematically evaluates multiple segmentation architectures and analyzes cross-domain generalization across synthetic and real datasets, reporting a sim-to-real gap of roughly 10% that is smaller than prior methods. A key finding is that synthetic data substantially improves cross-domain generalization, and even small amounts of real data can boost performance when combined with synthetic data, though real data alone may outperform synthetic data for within-domain tasks. The work demonstrates the practicality of synthetic datasets for scalable training in agriculture and suggests hybrid strategies and domain adaptation techniques to further close the gap to real-world performance.

Abstract

Precise semantic segmentation of crops and weeds is necessary for agricultural weeding robots. However, training deep learning models requires large annotated datasets, which are costly to obtain in real fields. Synthetic data can reduce this burden, but the gap between simulated and real images remains a challenge. In this paper, we present a pipeline for procedural generation of synthetic crop-weed images using Blender, producing annotated datasets under diverse conditions of plant growth, weed density, lighting, and camera angle. We benchmark several state-of-the-art segmentation models on synthetic and real datasets and analyze their cross-domain generalization. Our results show that training on synthetic images leads to a sim-to-real gap of 10%, surpassing previous state-of-the-art methods. Moreover, synthetic data demonstrates good generalization properties, outperforming real datasets in cross-domain scenarios. These findings highlight the potential of synthetic agricultural datasets and support hybrid strategies for more efficient model training.

Paper Structure

This paper contains 13 sections, 2 figures, 2 tables.

Figures (2)

  • Figure 1: Examples of synthetic images generated with CropCraft under different conditions of time of day, maize growth stage, weed density, and camera angle.
  • Figure 2: Examples of real images collected in maize fields during the ROSE and ACRE challenges. The datasets were acquired in different years with different robots and cameras.