Table of Contents
Fetching ...

Weakly Supervised Panoptic Segmentation for Defect-Based Grading of Fresh Produce

Manuel Knott, Divinefavour Odion, Sameer Sontakke, Anup Karwa, Thijs Defraeye

TL;DR

The paper tackles defect-based grading of fresh produce under low-data conditions by leveraging the Segment Anything Model to generate dense panoptic masks from sparse annotations and training a downstream panoptic segmentation model on those masks. The approach reduces manual annotation effort while enabling counting and sizing of visible defects on bananas, validated on 476 images with 1440 defects. Results show that SAM-generated masks largely align with human annotations, enabling similar panoptic quality to fully supervised training, with some failure modes for small or elongated defects. The study demonstrates practical potential for defect quantification in low-data agricultural settings while outlining limitations and directions for improvement.

Abstract

Visual inspection for defect grading in agricultural supply chains is crucial but traditionally labor-intensive and error-prone. Automated computer vision methods typically require extensively annotated datasets, which are often unavailable in decentralized supply chains. We address this challenge by evaluating the Segment Anything Model (SAM) to generate dense panoptic segmentation masks from sparse annotations. These dense predictions are then used to train a supervised panoptic segmentation model. Focusing on banana surface defects (bruises and scars), we validate our approach using 476 field images annotated with 1440 defects. While SAM-generated masks generally align with human annotations, substantially reducing annotation effort, we explicitly identify failure cases associated with specific defect sizes and shapes. Despite these limitations, our approach offers practical estimates of defect number and relative size from panoptic masks, underscoring the potential and current boundaries of foundation models for defect quantification in low-data agricultural scenarios. GitHub: https://github.com/manuelknott/banana-defect-segmentation

Weakly Supervised Panoptic Segmentation for Defect-Based Grading of Fresh Produce

TL;DR

The paper tackles defect-based grading of fresh produce under low-data conditions by leveraging the Segment Anything Model to generate dense panoptic masks from sparse annotations and training a downstream panoptic segmentation model on those masks. The approach reduces manual annotation effort while enabling counting and sizing of visible defects on bananas, validated on 476 images with 1440 defects. Results show that SAM-generated masks largely align with human annotations, enabling similar panoptic quality to fully supervised training, with some failure modes for small or elongated defects. The study demonstrates practical potential for defect quantification in low-data agricultural settings while outlining limitations and directions for improvement.

Abstract

Visual inspection for defect grading in agricultural supply chains is crucial but traditionally labor-intensive and error-prone. Automated computer vision methods typically require extensively annotated datasets, which are often unavailable in decentralized supply chains. We address this challenge by evaluating the Segment Anything Model (SAM) to generate dense panoptic segmentation masks from sparse annotations. These dense predictions are then used to train a supervised panoptic segmentation model. Focusing on banana surface defects (bruises and scars), we validate our approach using 476 field images annotated with 1440 defects. While SAM-generated masks generally align with human annotations, substantially reducing annotation effort, we explicitly identify failure cases associated with specific defect sizes and shapes. Despite these limitations, our approach offers practical estimates of defect number and relative size from panoptic masks, underscoring the potential and current boundaries of foundation models for defect quantification in low-data agricultural scenarios. GitHub: https://github.com/manuelknott/banana-defect-segmentation

Paper Structure

This paper contains 25 sections, 8 figures, 4 tables.

Figures (8)

  • Figure 1: Overview of our approach. We utilize the Segment Anything Model (SAM), a promptable visual foundation model for image segmentation, to generate dense annotations (pixel-wise class and instance labels) from coarse annotations (bounding boxes and reference points) without any model training involved. These dense annotations otherwise require tedious hand-annotation. Using these newly generated labels, we train a panoptic segmentation model to identify surface defects on banana fruits, specifically bruises and scars. Additionally, the model can differentiate between foreground banana fruits and those in the background. This enables us to determine the number and size of visible surface defects from photographs of banana fruits. We validate our approach using a dataset of 476 images and 1440 annotated defects.
  • Figure 2: Agreement between human-annotated and SAM-predicted masks by defect size (number of pixels). The x-axis shows binned size categories for defect sizes in pixels (per annotated mask), while $n$ denotes the number of samples in each bin. The y-axis shows the agreement between annotated and predicted masks (IoU). SAM fails to align with human annotations for small ($<\!100$ pixels) and very small ($<\!10$ pixels) defects. To understand the failure cases of larger defects ($>\!100$ pixels), we added exemplary visualizations in \ref{['fig:sam-failure']}. Those cases often show thin, long scars. We can also see that a larger SAM model size (ViT-H) leads to consistently higher agreement with human annotations across all defect sizes. An additional analysis of the impact of SAM model sizes can be found in \ref{['fig:sam-ious']}.
  • Figure 4: Annotated versus predicted defect counts per image. We predict the exact number of defects 36.2% of the time (green squares). 76.2% of the time, we predict correctly within a $\pm1$ tolerance (orange squares). Generally, our model tends to predict more defects in the images compared to the annotations.
  • Figure 5: Annotated versus predicted defect sizes. We pairwise match those defect instances with the highest IoU agreement between annotation and estimation (minimum 0.5) and calculate their sizes relative to the corresponding foreground banana masks (non-matchable defects are excluded in this analysis). Relative size prediction is generally accurate with a Pearson correlation of $r=0.96, n=793$. Each blue dot is one pair of defects, the orange line is a fitted regression line, and the black dashed line is the $x=y$ diagonal.
  • Figure 6: Example visualizations of annotated vs predicted masks. Left: Input Image, Mid: Annotation, Right: Maskformer Prediction. Red rectangles enclose defect instances. Segments are color-coded as follows: Foreground Banana, Background Banana, Defect,
  • ...and 3 more figures