Table of Contents
Fetching ...

Few-Shot Fruit Segmentation via Transfer Learning

Jordan A. James, Heather K. Manching, Amanda M. Hulse-Kemp, William J. Beksi

TL;DR

The paper addresses the challenge of few-shot semantic segmentation for in-field fruits under limited labeled data. It introduces a specialized pre-training strategy using the CitDet citrus dataset to bootstrap learning for apples and transfers this knowledge to the MinneApple target task. A light-weight three-branch decoder (Spatial, Context, ADB) with BAG fusion learns fruit shapes and boundaries to enable effective transfer across domains. Evaluations on MinneApple show improved few-shot and zero-shot performance and highlight boundary refinement, signaling practical potential for automated fruit harvesting and yield estimation in agriculture.

Abstract

Advancements in machine learning, computer vision, and robotics have paved the way for transformative solutions in various domains, particularly in agriculture. For example, accurate identification and segmentation of fruits from field images plays a crucial role in automating jobs such as harvesting, disease detection, and yield estimation. However, achieving robust and precise infield fruit segmentation remains a challenging task since large amounts of labeled data are required to handle variations in fruit size, shape, color, and occlusion. In this paper, we develop a few-shot semantic segmentation framework for infield fruits using transfer learning. Concretely, our work is aimed at addressing agricultural domains that lack publicly available labeled data. Motivated by similar success in urban scene parsing, we propose specialized pre-training using a public benchmark dataset for fruit transfer learning. By leveraging pre-trained neural networks, accurate semantic segmentation of fruit in the field is achieved with only a few labeled images. Furthermore, we show that models with pre-training learn to distinguish between fruit still on the trees and fruit that have fallen on the ground, and they can effectively transfer the knowledge to the target fruit dataset.

Few-Shot Fruit Segmentation via Transfer Learning

TL;DR

The paper addresses the challenge of few-shot semantic segmentation for in-field fruits under limited labeled data. It introduces a specialized pre-training strategy using the CitDet citrus dataset to bootstrap learning for apples and transfers this knowledge to the MinneApple target task. A light-weight three-branch decoder (Spatial, Context, ADB) with BAG fusion learns fruit shapes and boundaries to enable effective transfer across domains. Evaluations on MinneApple show improved few-shot and zero-shot performance and highlight boundary refinement, signaling practical potential for automated fruit harvesting and yield estimation in agriculture.

Abstract

Advancements in machine learning, computer vision, and robotics have paved the way for transformative solutions in various domains, particularly in agriculture. For example, accurate identification and segmentation of fruits from field images plays a crucial role in automating jobs such as harvesting, disease detection, and yield estimation. However, achieving robust and precise infield fruit segmentation remains a challenging task since large amounts of labeled data are required to handle variations in fruit size, shape, color, and occlusion. In this paper, we develop a few-shot semantic segmentation framework for infield fruits using transfer learning. Concretely, our work is aimed at addressing agricultural domains that lack publicly available labeled data. Motivated by similar success in urban scene parsing, we propose specialized pre-training using a public benchmark dataset for fruit transfer learning. By leveraging pre-trained neural networks, accurate semantic segmentation of fruit in the field is achieved with only a few labeled images. Furthermore, we show that models with pre-training learn to distinguish between fruit still on the trees and fruit that have fallen on the ground, and they can effectively transfer the knowledge to the target fruit dataset.
Paper Structure (18 sections, 6 figures, 4 tables)

This paper contains 18 sections, 6 figures, 4 tables.

Figures (6)

  • Figure 1: Qualitative results of our two-shot apple segmentation with specialized pre-training on CitDet james2024citdet and generalized pre-training on ImageNet deng2009imagenet. Predicted fruit on the tree are colored red while predicted fruit on the ground are colored blue. Models pre-trained on CitDet are capable of distinguishing between fruit on the ground and fruit on the tree using only small amounts of labeled data to learn from.
  • Figure 2: The architecture of our few-shot fruit segmentation network. The encoder, ResNet-18, is colored yellow, the decoder is blue, and the auxiliary boundary head (train only) is green. The "Basic Block" is the basic residual block in ResNet-18, and the "Bottleneck" is the bottleneck block from ResNet-50. The ConvBNRelu blocks use 3 $\times$ 3 kernels for the convolutional layers.
  • Figure 3: A visualization of the data augmentation used for few-shot semantic segmentation of in-orchard apples. The original image is shown in (a) and the resulting augmented image crops are shown in (b). Each panel in (b) displays a random crop of the scale used where the top left is scaled by 0.75, top right by 1.0, bottom left by 1.25, and bottom right by 1.5.
  • Figure 4: Qualitative results for few-shot and full-shot apple segmentation for each pre-training method. Predicted fruit on the tree are colored cyan and predicted fruit on the ground are colored magenta. The original input can be seen in (a), the ground-truth labels in (b), and predictions in (c). Each row of predictions correspond to the pre-training method. The top row is CitDet, the middle row is ImageNet, and the bottom row is no pre-training. Each column corresponds to the number of training images used for fine tuning (i.e., 2, 4, and 670), increasing from left to right (best viewed zoomed in).
  • Figure 5: The mIoU performance on the MinneApple test set versus the number of training images used from the MinneApple training set. Standard pre-training methods are marked as dots, and our specialized pre-training is shown in blue stars. Our method achieves a better accuracy than traditional approaches, especially with only a few annotated images.
  • ...and 1 more figures