Importance of realism in procedurally-generated synthetic images for deep learning: case studies in maize and canola

Nazifa Azam Khan; Mikolaj Cieslak; Ian McQuillan

Importance of realism in procedurally-generated synthetic images for deep learning: case studies in maize and canola

Nazifa Azam Khan, Mikolaj Cieslak, Ian McQuillan

TL;DR

This work investigates how realism in procedurally generated synthetic images, produced via $L$-systems, affects deep learning-based plant phenotyping for maize and canola. By systematically varying real-versus-synthetic training data and refining the canola L-system, the authors show that realistic synthetic data can substantially reduce the need for real annotations (especially in maize) and, when refined, can approach or match real-data performance for canola. Importantly, the study demonstrates a feedback loop where neural-network predictions guide L-system calibration, improving synthetic realism and downstream accuracy. The findings highlight the potential of realism-aware synthetic data to enable data-efficient phenotyping in diverse environments.

Abstract

Artificial neural networks are often used to identify features of crop plants. However, training their models requires many annotated images, which can be expensive and time-consuming to acquire. Procedural models of plants, such as those developed with Lindenmayer-systems (L-systems) can be created to produce visually realistic simulations, and hence images of plant simulations, where annotations are implicitly known. These synthetic images can either augment or completely replace real images in training neural networks for phenotyping tasks. In this paper, we systematically vary amounts of real and synthetic images used for training in both maize and canola to better understand situations where synthetic images generated from L-systems can help prediction on real images. This work also explores the degree to which realism in the synthetic images improves prediction. We have five different variants of a procedural canola model (these variants were created by tuning the realism while using calibration), and the deep learning results showed how drastically these results improve as the canola synthetic images are made to be more realistic. Furthermore, we see how neural network predictions can be used to help calibrate L-systems themselves, creating a feedback loop.

Importance of realism in procedurally-generated synthetic images for deep learning: case studies in maize and canola

TL;DR

This work investigates how realism in procedurally generated synthetic images, produced via

-systems, affects deep learning-based plant phenotyping for maize and canola. By systematically varying real-versus-synthetic training data and refining the canola L-system, the authors show that realistic synthetic data can substantially reduce the need for real annotations (especially in maize) and, when refined, can approach or match real-data performance for canola. Importantly, the study demonstrates a feedback loop where neural-network predictions guide L-system calibration, improving synthetic realism and downstream accuracy. The findings highlight the potential of realism-aware synthetic data to enable data-efficient phenotyping in diverse environments.

Abstract

Paper Structure (14 sections, 9 figures, 4 tables)

This paper contains 14 sections, 9 figures, 4 tables.

Introduction
Dataset
Maize dataset:
Canola dataset:
Methodology
Maize study methodology:
Canola study methodology:
Results
Maize study results:
Canola study results:
Refinement of Canola L-systems
Discussion
Conclusion
Future Work

Figures (9)

Figure 1: (a) A real maize image on day 25 from 0 degree view. (b) A synthetic maize plant on day 25 generated from the maize L-system.
Figure 1: Distribution of inflorescence branch count in synthetic canola images changes from first model using $C^S_1$ to the fourth model using $C^S_4$, which helped to improve results.
Figure 2: (a) -- (e) Synthetic canola images generated from $C^S_1$ through $C^S_5$ respectively, at the same time point. (f) A real canola image at approximately the same time.
Figure 2: Distribution of inflorescence branch count in real canola images used for training (left), and for testing (right) respectively.
Figure 3: The mean absolute losses for leaf counting in maize from Table \ref{['tab_1']} when training with only real images and testing with 100 real images (blue color), training with synthetic images plus (optionally) some number of real images and testing with 100 real images (green color), training with only real images and testing with remaining real images (red color), and training with synthetic images plus (optionally) some number of real images and testing with remaining real images (pink color).
...and 4 more figures

Importance of realism in procedurally-generated synthetic images for deep learning: case studies in maize and canola

TL;DR

Abstract

Importance of realism in procedurally-generated synthetic images for deep learning: case studies in maize and canola

Authors

TL;DR

Abstract

Table of Contents

Figures (9)