Accurate Cutting-point Estimation for Robotic Lychee Harvesting through Geometry-aware Learning

Gengming Zhang; Hao Cao; Kewei Hu; Yaoqiang Pan; Yuqin Deng; Hongjun Wang; Hanwen Kang

Accurate Cutting-point Estimation for Robotic Lychee Harvesting through Geometry-aware Learning

Gengming Zhang, Hao Cao, Kewei Hu, Yaoqiang Pan, Yuqin Deng, Hongjun Wang, Hanwen Kang

TL;DR

This work tackles the problem of accurately localizing lychee picking points in unstructured orchards where 2D detection struggles due to occlusion and geometry. It introduces Fcaf3d-lychee, a 3D point-cloud detector augmented with Squeeze-and-Excitation attention to improve small-target feature extraction, trained on a multi-view Azure Kinect TOF dataset with three-view stitching. The key contributions include adapting Fcaf3d with SE-Res enhancements for lychee, assembling a dedicated lychee point dataset, and validating robust 3D localisation with a maximum error of $\pm 1.5\text{ cm}$ and a high $F_{1}$ score of $88.57\%$, outperforming baselines like Fcaf3d, Votenet, and Tr3d. The method enables reliable end-to-end 3D picking-point localisation under occlusion, supporting practical robotic harvesting of lychees in real orchards.

Abstract

Accurately identifying lychee-picking points in unstructured orchard environments and obtaining their coordinate locations is critical to the success of lychee-picking robots. However, traditional two-dimensional (2D) image-based object detection methods often struggle due to the complex geometric structures of branches, leaves and fruits, leading to incorrect determination of lychee picking points. In this study, we propose a Fcaf3d-lychee network model specifically designed for the accurate localisation of lychee picking points. Point cloud data of lychee picking points in natural environments are acquired using Microsoft's Azure Kinect DK time-of-flight (TOF) camera through multi-view stitching. We augment the Fully Convolutional Anchor-Free 3D Object Detection (Fcaf3d) model with a squeeze-and-excitation(SE) module, which exploits human visual attention mechanisms for improved feature extraction of lychee picking points. The trained network model is evaluated on a test set of lychee-picking locations and achieves an impressive F1 score of 88.57%, significantly outperforming existing models. Subsequent three-dimensional (3D) position detection of picking points in real lychee orchard environments yields high accuracy, even under varying degrees of occlusion. Localisation errors of lychee picking points are within 1.5 cm in all directions, demonstrating the robustness and generality of the model.

Accurate Cutting-point Estimation for Robotic Lychee Harvesting through Geometry-aware Learning

TL;DR

and a high

score of

, outperforming baselines like Fcaf3d, Votenet, and Tr3d. The method enables reliable end-to-end 3D picking-point localisation under occlusion, supporting practical robotic harvesting of lychees in real orchards.

Abstract

Paper Structure (27 sections, 10 equations, 9 figures, 5 tables)

This paper contains 27 sections, 10 equations, 9 figures, 5 tables.

Introduction
Related Works
Review on Image-Based 2D Target Detection in lychees
Review on Deep Learning-Based lychee 2D Target Detection Methods
Review on Deep Learning-Based 3D Target Detection Methods for Fruits
Materials and Methods
System overview
Hand-Eye vision model & Closed-Loop calibration method
Point cloud acquisition
Filtering
Stitching
Fcaf3d-lychee for lychee picking point detection
Squeeze-and-Excitation (SE) module
Lychee Picking Point Dataset
Data collection
...and 12 more sections

Figures (9)

Figure 1: Lychee picking point robot system and methods.
Figure 2: After color and statistical filter.
Figure 3: Fcaf3d network structure.
Figure 4: Structure of SE module.
Figure 5: Point cloud augmentation.
...and 4 more figures

Accurate Cutting-point Estimation for Robotic Lychee Harvesting through Geometry-aware Learning

TL;DR

Abstract

Accurate Cutting-point Estimation for Robotic Lychee Harvesting through Geometry-aware Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (9)