Table of Contents
Fetching ...

Synthetic Data Generation for Anomaly Detection on Table Grapes

Ionut Marian Motoi, Valerio Belli, Alberto Carpineto, Daniele Nardi, Thomas Alessandro Ciarfuglia

TL;DR

The paper tackles data scarcity in anomaly detection for table grapes by introducing a semi-automatic synthetic data generation pipeline that uses a Dual-Canny Edge Detection (DCED) filter to emphasize defect textures and the Segment Anything Model (SAM) to extract anomalous masks. By aligning and blending rotated/scaled anomalous berry segments onto healthy berry targets via PCA-based orientation and Poisson blending, the method creates realistic synthetic samples with minimal manual input, governed by parameters such as $K$, $wth_{min}$, $wth_{max}$, $nth_{min}$, and $nth_{max}$. Empirical results show that augmenting real data with synthetic samples improves balanced accuracy and F1-score for an anomaly classifier, while substituting real anomalies with synthetic ones can hurt performance; the gains are particularly evident when adding a moderate amount of synthetic data and when using multi-berry pasting. The approach demonstrates practicality for farmers and can generalize to other fruit types, offering a scalable way to mitigate data scarcity in agricultural anomaly detection.

Abstract

Early detection of illnesses and pest infestations in fruit cultivation is critical for maintaining yield quality and plant health. Computer vision and robotics are increasingly employed for the automatic detection of such issues, particularly using data-driven solutions. However, the rarity of these problems makes acquiring and processing the necessary data to train such algorithms a significant obstacle. One solution to this scarcity is the generation of synthetic high-quality anomalous samples. While numerous methods exist for this task, most require highly trained individuals for setup. This work addresses the challenge of generating synthetic anomalies in an automatic fashion that requires only an initial collection of normal and anomalous samples from the user - a task that is straightforward for farmers. We demonstrate the approach in the context of table grape cultivation. Specifically, based on the observation that normal berries present relatively smooth surfaces, while defects result in more complex textures, we introduce a Dual-Canny Edge Detection (DCED) filter. This filter emphasizes the additional texture indicative of diseases, pest infestations, or other defects. Using segmentation masks provided by the Segment Anything Model, we then select and seamlessly blend anomalous berries onto normal ones. We show that the proposed dataset augmentation technique improves the accuracy of an anomaly classifier for table grapes and that the approach can be generalized to other fruit types.

Synthetic Data Generation for Anomaly Detection on Table Grapes

TL;DR

The paper tackles data scarcity in anomaly detection for table grapes by introducing a semi-automatic synthetic data generation pipeline that uses a Dual-Canny Edge Detection (DCED) filter to emphasize defect textures and the Segment Anything Model (SAM) to extract anomalous masks. By aligning and blending rotated/scaled anomalous berry segments onto healthy berry targets via PCA-based orientation and Poisson blending, the method creates realistic synthetic samples with minimal manual input, governed by parameters such as , , , , and . Empirical results show that augmenting real data with synthetic samples improves balanced accuracy and F1-score for an anomaly classifier, while substituting real anomalies with synthetic ones can hurt performance; the gains are particularly evident when adding a moderate amount of synthetic data and when using multi-berry pasting. The approach demonstrates practicality for farmers and can generalize to other fruit types, offering a scalable way to mitigate data scarcity in agricultural anomaly detection.

Abstract

Early detection of illnesses and pest infestations in fruit cultivation is critical for maintaining yield quality and plant health. Computer vision and robotics are increasingly employed for the automatic detection of such issues, particularly using data-driven solutions. However, the rarity of these problems makes acquiring and processing the necessary data to train such algorithms a significant obstacle. One solution to this scarcity is the generation of synthetic high-quality anomalous samples. While numerous methods exist for this task, most require highly trained individuals for setup. This work addresses the challenge of generating synthetic anomalies in an automatic fashion that requires only an initial collection of normal and anomalous samples from the user - a task that is straightforward for farmers. We demonstrate the approach in the context of table grape cultivation. Specifically, based on the observation that normal berries present relatively smooth surfaces, while defects result in more complex textures, we introduce a Dual-Canny Edge Detection (DCED) filter. This filter emphasizes the additional texture indicative of diseases, pest infestations, or other defects. Using segmentation masks provided by the Segment Anything Model, we then select and seamlessly blend anomalous berries onto normal ones. We show that the proposed dataset augmentation technique improves the accuracy of an anomaly classifier for table grapes and that the approach can be generalized to other fruit types.

Paper Structure

This paper contains 11 sections, 1 equation, 6 figures, 3 tables, 2 algorithms.

Figures (6)

  • Figure 1: Robotic harvesting, as in the EU CANOPIES project, requires high accuracy in detecting anomalous fruits since false negatives could lead to the spread of pest infestations and diseases throughout the orchard. However, given the high variability of possible anomalies and the relative scarcity of naturally occurring examples, synthetic data generation has become an important aspect of addressing this challenge.
  • Figure 2: The upper section of the diagram illustrates the preliminary stages of data collection and DCED parameter tuning. The lower section details the synthetic sample generation process. Starting with a pair of real samples from the training set, the system uses SAM to extract their respective masks. The anomalous berry is identified by taking the mask with the highest edge pixel ratio, as determined by the tuned DCED. In contrast, the normal berry is randomly selected. After rotating, scaling, and shifting the anomalous berry, we compute the intersection of the two masks. Finally, we employ Poisson blending to merge the berries and generate a new synthetic sample.
  • Figure 3: Examples of table grape image patches and relative edge extraction with a canny edge detector. On the left, there is a patch (a) containing berries in good shape and its corresponding edge detection result (b). The edges correspond to the external contour of the berries, while the internal surface is smooth. Image (c) shows a patch containing anomalous berries. The corresponding edge detection result (d) exhibits a more complex edge map, indicating a rougher texture due to defects.
  • Figure 4: Dual Canny Edge Detection (DCED) example: (a) is the starting anomalous sample, (b) and (c) represent the extracted edges using CED with different thresholds (wide allows for more edges, narrow is more selective), (d) shows the difference between the two edge spaces. It can be seen that, while some border edges are maintained, many edges belong to the anomalous texture.
  • Figure 5: PCA is applied to the two berry masks to determine their primary axis of variation (represented by the green and pink arrows) and compute the angle $\phi$ necessary for aligning the berries. The anomalous grape is then rotated, scaled, and translated to match the normal berry. Finally, the intersection of the two masks is computed, and Poisson blending is employed to seamlessly merge the two images, creating a new anomalous berry.
  • ...and 1 more figures