LAESI: Leaf Area Estimation with Synthetic Imagery
Jacek Kałużny, Yannik Schreckenberg, Karol Cyganik, Peter Annighöfer, Sören Pirk, Dominik L. Michels, Mikolaj Cieslak, Farhah Assaad-Gerbert, Bedrich Benes, Wojciech Pałubicki
TL;DR
The paper tackles the scarcity and cost of real annotated leaf data by introducing LAESI, a synthetic dataset of 100K leaf images rendered on millimeter paper with precise semantic masks and leaf-area labels. It combines fast, controllable 3D procedural leaf and background generation in Unity with a ControlNet-based inpainting pipeline and a filtering step to ensure annotation consistency, enabling large-scale, domain-relevant data generation. Validation shows that models trained with LAESI data can achieve leaf-area predictions with a mean relative error competitive with or better than human annotators, while preserving segmentation performance, indicating strong potential for agriculture and biology applications. The work demonstrates that synthetic data, when carefully filtered and inpainted, can significantly improve domain-specific vision tasks and can be adopted for remote sensing and precision agriculture pipelines.
Abstract
We introduce LAESI, a Synthetic Leaf Dataset of 100,000 synthetic leaf images on millimeter paper, each with semantic masks and surface area labels. This dataset provides a resource for leaf morphology analysis primarily aimed at beech and oak leaves. We evaluate the applicability of the dataset by training machine learning models for leaf surface area prediction and semantic segmentation, using real images for validation. Our validation shows that these models can be trained to predict leaf surface area with a relative error not greater than an average human annotator. LAESI also provides an efficient framework based on 3D procedural models and generative AI for the large-scale, controllable generation of data with potential further applications in agriculture and biology. We evaluate the inclusion of generative AI in our procedural data generation pipeline and show how data filtering based on annotation consistency results in datasets which allow training the highest performing vision models.
