Table of Contents
Fetching ...

TomatoScanner: phenotyping tomato fruit based on only RGB image

Xiaobei Zhao, Xiangrong Zeng, Yihang Ma, Pengjin Tang, Xiang Li

TL;DR

TomatoScanner addresses the need for non-contact, RGB-only tomato phenotyping by predicting width, height, vertical area, and volume from RGB images. It fuses pixel-level features from EdgeYOLO-based segmentation with depth cues from a monocular Depth Pro Estimator (Depth Pro) via Pixel-Depth Feature Fusion, while emphasizing edge accuracy through EdgeAttention, EdgeLoss, and EdgeBoost. The approach achieves median relative errors of 5.63% for width, 7.03% for height, and −0.64% for vertical area (with volume more challenging), and delivers a lightweight EdgeYOLO implementation at 48.7 M parameters and 76.34 FPS. The work provides open-source code and a Tomato Phenotype Dataset, highlights strengths in automation and practicality, and outlines future work toward deploying TomatoScanner on autonomous greenhouse platforms, with ongoing improvements to volume estimation and pose robustness.

Abstract

In tomato greenhouse, phenotypic measurement is meaningful for researchers and farmers to monitor crop growth, thereby precisely control environmental conditions in time, leading to better quality and higher yield. Traditional phenotyping mainly relies on manual measurement, which is accurate but inefficient, more importantly, endangering the health and safety of people. Several studies have explored computer vision-based methods to replace manual phenotyping. However, the 2D-based need extra calibration, or cause destruction to fruit, or can only measure limited and meaningless traits. The 3D-based need extra depth camera, which is expensive and unacceptable for most farmers. In this paper, we propose a non-contact tomato fruit phenotyping method, titled TomatoScanner, where RGB image is all you need for input. First, pixel feature is extracted by instance segmentation of our proposed EdgeYOLO with preprocessing of individual separation and pose correction. Second, depth feature is extracted by depth estimation of Depth Pro. Third, pixel and depth feature are fused to output phenotype results in reality. We establish self-built Tomato Phenotype Dataset to test TomatoScanner, which achieves excellent phenotyping on width, height, vertical area and volume, with median relative error of 5.63%, 7.03%, -0.64% and 37.06%, respectively. We propose and add three innovative modules - EdgeAttention, EdgeLoss and EdgeBoost - into EdgeYOLO, to enhance the segmentation accuracy on edge portion. Precision and mean Edge Error greatly improve from 0.943 and 5.641% to 0.986 and 2.963%, respectively. Meanwhile, EdgeYOLO keeps lightweight and efficient, with 48.7 M weights size and 76.34 FPS. Codes and datasets: https://github.com/AlexTraveling/TomatoScanner.

TomatoScanner: phenotyping tomato fruit based on only RGB image

TL;DR

TomatoScanner addresses the need for non-contact, RGB-only tomato phenotyping by predicting width, height, vertical area, and volume from RGB images. It fuses pixel-level features from EdgeYOLO-based segmentation with depth cues from a monocular Depth Pro Estimator (Depth Pro) via Pixel-Depth Feature Fusion, while emphasizing edge accuracy through EdgeAttention, EdgeLoss, and EdgeBoost. The approach achieves median relative errors of 5.63% for width, 7.03% for height, and −0.64% for vertical area (with volume more challenging), and delivers a lightweight EdgeYOLO implementation at 48.7 M parameters and 76.34 FPS. The work provides open-source code and a Tomato Phenotype Dataset, highlights strengths in automation and practicality, and outlines future work toward deploying TomatoScanner on autonomous greenhouse platforms, with ongoing improvements to volume estimation and pose robustness.

Abstract

In tomato greenhouse, phenotypic measurement is meaningful for researchers and farmers to monitor crop growth, thereby precisely control environmental conditions in time, leading to better quality and higher yield. Traditional phenotyping mainly relies on manual measurement, which is accurate but inefficient, more importantly, endangering the health and safety of people. Several studies have explored computer vision-based methods to replace manual phenotyping. However, the 2D-based need extra calibration, or cause destruction to fruit, or can only measure limited and meaningless traits. The 3D-based need extra depth camera, which is expensive and unacceptable for most farmers. In this paper, we propose a non-contact tomato fruit phenotyping method, titled TomatoScanner, where RGB image is all you need for input. First, pixel feature is extracted by instance segmentation of our proposed EdgeYOLO with preprocessing of individual separation and pose correction. Second, depth feature is extracted by depth estimation of Depth Pro. Third, pixel and depth feature are fused to output phenotype results in reality. We establish self-built Tomato Phenotype Dataset to test TomatoScanner, which achieves excellent phenotyping on width, height, vertical area and volume, with median relative error of 5.63%, 7.03%, -0.64% and 37.06%, respectively. We propose and add three innovative modules - EdgeAttention, EdgeLoss and EdgeBoost - into EdgeYOLO, to enhance the segmentation accuracy on edge portion. Precision and mean Edge Error greatly improve from 0.943 and 5.641% to 0.986 and 2.963%, respectively. Meanwhile, EdgeYOLO keeps lightweight and efficient, with 48.7 M weights size and 76.34 FPS. Codes and datasets: https://github.com/AlexTraveling/TomatoScanner.

Paper Structure

This paper contains 31 sections, 11 equations, 11 figures, 5 tables.

Figures (11)

  • Figure 1: A simple demonstration of TomatoScanner: (a) is the input RGB image. (b) and (c) are the output phenotyping results - width, height, vertical area and volume - of two fruits, respectively. (Zoom in for better observation)
  • Figure 2: TomatoScanner architecture: Individual Separation module is illustrated in the top left. Pose Correction module is illustrated in the top center. Instance Segmentation module is illustrated in the top right. Depth Estimation module is illustrated in the bottom left. Pixel-Depth Feature Fusion module is illustrated in the bottom right. (Zoom in for better observation)
  • Figure 3: EdgeYOLO architecture: Backbone and head are illustrated in the center; EdgeAttention is illustrated in the bottom right; EdgeLoss is illustrated in the top right; EdgeBoost is illustrated in the bottom left. (Zoom in for better observation)
  • Figure 4: mEE calculation illustration
  • Figure 5: Tomato Phenotype Dataset
  • ...and 6 more figures