Table of Contents
Fetching ...

Generating Diverse Agricultural Data for Vision-Based Farming Applications

Mikolaj Cieslak, Umabharathi Govindarajan, Alejandro Garcia, Anuradha Chandrashekar, Torsten Hädrich, Aleksander Mendoza-Drosik, Dominik L. Michels, Sören Pirk, Chia-Chun Fu, Wojciech Pałubicki

TL;DR

This paper tackles the need for diverse, labeled imagery for vision-based farming by introducing a specialized procedural pipeline to generate soybean-field scenes with weeds across growth stages, soils, and lighting. Utilizing Blender and L-system–based growth, it builds textures, soils, and field layouts, with an optional domain-adaptation step via CUT GAN to bridge synthetic and real domains. The authors generate 12,000 labeled synthetic images and 12,000 domain-adapted variants, and validate them through cosine similarity, t-SNE embeddings, and semantic segmentation benchmarks using DeepLabv3 and SegFormer. Findings indicate that mixing synthetic with real data improves crop-weed IoU and generalizes to out-of-distribution crops like cotton, though domain-adapted images do not always outperform rendered data, highlighting domain-gap nuances in agricultural scenes. Overall, the work provides a cost-effective, scalable approach to augment agricultural vision datasets and informs future explorations of edge cases and generative ensembles in precision agriculture.

Abstract

We present a specialized procedural model for generating synthetic agricultural scenes, focusing on soybean crops, along with various weeds. This model is capable of simulating distinct growth stages of these plants, diverse soil conditions, and randomized field arrangements under varying lighting conditions. The integration of real-world textures and environmental factors into the procedural generation process enhances the photorealism and applicability of the synthetic data. Our dataset includes 12,000 images with semantic labels, offering a comprehensive resource for computer vision tasks in precision agriculture, such as semantic segmentation for autonomous weed control. We validate our model's effectiveness by comparing the synthetic data against real agricultural images, demonstrating its potential to significantly augment training data for machine learning models in agriculture. This approach not only provides a cost-effective solution for generating high-quality, diverse data but also addresses specific needs in agricultural vision tasks that are not fully covered by general-purpose models.

Generating Diverse Agricultural Data for Vision-Based Farming Applications

TL;DR

This paper tackles the need for diverse, labeled imagery for vision-based farming by introducing a specialized procedural pipeline to generate soybean-field scenes with weeds across growth stages, soils, and lighting. Utilizing Blender and L-system–based growth, it builds textures, soils, and field layouts, with an optional domain-adaptation step via CUT GAN to bridge synthetic and real domains. The authors generate 12,000 labeled synthetic images and 12,000 domain-adapted variants, and validate them through cosine similarity, t-SNE embeddings, and semantic segmentation benchmarks using DeepLabv3 and SegFormer. Findings indicate that mixing synthetic with real data improves crop-weed IoU and generalizes to out-of-distribution crops like cotton, though domain-adapted images do not always outperform rendered data, highlighting domain-gap nuances in agricultural scenes. Overall, the work provides a cost-effective, scalable approach to augment agricultural vision datasets and informs future explorations of edge cases and generative ensembles in precision agriculture.

Abstract

We present a specialized procedural model for generating synthetic agricultural scenes, focusing on soybean crops, along with various weeds. This model is capable of simulating distinct growth stages of these plants, diverse soil conditions, and randomized field arrangements under varying lighting conditions. The integration of real-world textures and environmental factors into the procedural generation process enhances the photorealism and applicability of the synthetic data. Our dataset includes 12,000 images with semantic labels, offering a comprehensive resource for computer vision tasks in precision agriculture, such as semantic segmentation for autonomous weed control. We validate our model's effectiveness by comparing the synthetic data against real agricultural images, demonstrating its potential to significantly augment training data for machine learning models in agriculture. This approach not only provides a cost-effective solution for generating high-quality, diverse data but also addresses specific needs in agricultural vision tasks that are not fully covered by general-purpose models.
Paper Structure (13 sections, 8 figures, 3 tables)

This paper contains 13 sections, 8 figures, 3 tables.

Figures (8)

  • Figure 1: An example collection of 3D virtual plants generated by the procedural soybean model. Although the model has several parameters to control the plant's morphology, in practice to create fields of plants, we vary the age of the plant (x axis) and a randomization seed (y axis).
  • Figure 2: Texture atlases used for the soybean plants. From top to bottom, the texture atlases represent the following: diffuse/albedo map, height map, normal map, roughness map, and alpha mask map. The diffuse maps in the top row were obtained from real images through automatic segmentation, whereas the remainder were generated procedurally from their respective diffuse map .
  • Figure 3: Selected images from our synthetic dataset, showing variation in crop growth stages, crop spacing, weed distribution, soil type, crop orientation with respect to the camera, and amount of debris.
  • Figure 4: An example image from our synthetic dataset: (left) rendered, (middle) domain adapted, (right) generated labels, where red is crop, green is broadleaf weed, and blue is grassy weed.
  • Figure 5: (A) Cosine similarity test between 1,000 images in the real and synthetic datasets. The distributions are shown between real and real images (excluding to the same image), synthetic and real images, and domain-adapted and real images. (B) t-SNE plot for the real (blue), synthetic (orange), and domain-adapted (green) datasets. Each point represents an image, which was reduced to a 2-dimensional projection from the 2048-dimensional feature vector extracted from a ResNet-50 network.
  • ...and 3 more figures