A Scalable Pipeline Combining Procedural 3D Graphics and Guided Diffusion for Photorealistic Synthetic Training Data Generation in White Button Mushroom Segmentation
Artúr I. Károly, Péter Galambos
TL;DR
The paper tackles the challenge of scarce, annotated data for mushroom segmentation by introducing a hybrid pipeline that marries Blender-based 3D scene control with diffusion-model generation guided by depth maps. It produces two 6k-image synthetic datasets with instance-level annotations and evaluates zero-shot performance against real-world data and the M18K benchmark, achieving competitive results and demonstrating robust generalization. Key contributions include the IP-Adapter, LoRA modules, and depth-guided ControlNet to align synthetic images with ground-truth masks without manual shader setup. Ablation analyses show which components drive realism and domain alignment, while the approach remains adaptable to other agricultural domains beyond white button mushrooms.
Abstract
Industrial mushroom cultivation increasingly relies on computer vision for monitoring and automated harvesting. However, developing accurate detection and segmentation models requires large, precisely annotated datasets that are costly to produce. Synthetic data provides a scalable alternative, yet often lacks sufficient realism to generalize to real-world scenarios. This paper presents a novel workflow that integrates 3D rendering in Blender with a constrained diffusion model to automatically generate high-quality annotated, photorealistic synthetic images of Agaricus Bisporus mushrooms. This approach preserves full control over 3D scene configuration and annotations while achieving photorealism without the need for specialized computer graphics expertise. We release two synthetic datasets (each containing 6,000 images depicting over 250k mushroom instances) and evaluate Mask R-CNN models trained on them in a zero-shot setting. When tested on two independent real-world datasets (including a newly collected benchmark), our method achieves state-of-the-art segmentation performance (F1 = 0.859 on M18K), despite using only synthetic training data. Although the approach is demonstrated on Agaricus Bisporus mushrooms, the proposed pipeline can be readily adapted to other mushroom species or to other agricultural domains, such as fruit and leaf detection.
