Zero-Shot Automatic Annotation and Instance Segmentation using LLM-Generated Datasets: Eliminating Field Imaging and Manual Annotation for Deep Learning Model Development

Ranjan Sapkota; Achyut Paudel; Manoj Karkee

Zero-Shot Automatic Annotation and Instance Segmentation using LLM-Generated Datasets: Eliminating Field Imaging and Manual Annotation for Deep Learning Model Development

Ranjan Sapkota, Achyut Paudel, Manoj Karkee

TL;DR

The paper tackles the cost and logistics of data collection for agricultural instance segmentation by proposing a fully automated, zero-shot workflow that uses LLM-generated orchard images (via DALL-E) and automatic mask generation (SAMv2) to train a YOLO11 model. The approach demonstrates high-quality automatic masks with $Dice=0.9513$ and $IoU=0.9303$ on synthetic data, and strong field performance with YOLO11m-seg achieving $mask\;precision=0.902$ and $mAP@50=0.833$ on 42 real-world Azure Kinect images, indicating robust transfer from synthetic to real environments. The work reduces reliance on physical sensors and manual annotation, enabling scalable, rapid development of agricultural AI for tasks like fruit counting and robotic picking, and sets a baseline for zero-shot instance segmentation in agricultural domains. Overall, the study shows that a synthetic, automatically annotated dataset can train competitive instance segmentation models and validates their applicability in real orchards, with potential extensions to other crops and object classes.

Abstract

Currently, deep learning-based instance segmentation for various applications (e.g., Agriculture) is predominantly performed using a labor-intensive process involving extensive field data collection using sophisticated sensors, followed by careful manual annotation of images, presenting significant logistical and financial challenges to researchers and organizations. The process also slows down the model development and training process. In this study, we presented a novel method for deep learning-based instance segmentation of apples in commercial orchards that eliminates the need for labor-intensive field data collection and manual annotation. Utilizing a Large Language Model (LLM), we synthetically generated orchard images and automatically annotated them using the Segment Anything Model (SAM) integrated with a YOLO11 base model. This method significantly reduces reliance on physical sensors and manual data processing, presenting a major advancement in "Agricultural AI". The synthetic, auto-annotated dataset was used to train the YOLO11 model for Apple instance segmentation, which was then validated on real orchard images. The results showed that the automatically generated annotations achieved a Dice Coefficient of 0.9513 and an IoU of 0.9303, validating the accuracy and overlap of the mask annotations. All YOLO11 configurations, trained solely on these synthetic datasets with automated annotations, accurately recognized and delineated apples, highlighting the method's efficacy. Specifically, the YOLO11m-seg configuration achieved a mask precision of 0.902 and a mask mAP@50 of 0.833 on test images collected from a commercial orchard. Additionally, the YOLO11l-seg configuration outperformed other models in validation on 40 LLM-generated images, achieving the highest mask precision and mAP@50 metrics. Keywords: YOLO, SAM, SAMv2, YOLO11, YOLOv11, Segment Anything, YOLO-SAM

Zero-Shot Automatic Annotation and Instance Segmentation using LLM-Generated Datasets: Eliminating Field Imaging and Manual Annotation for Deep Learning Model Development

TL;DR

Abstract

Zero-Shot Automatic Annotation and Instance Segmentation using LLM-Generated Datasets: Eliminating Field Imaging and Manual Annotation for Deep Learning Model Development

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (11)