Table of Contents
Fetching ...

Joint 3D Point Cloud Segmentation using Real-Sim Loop: From Panels to Trees and Branches

Tian Qiu, Ruiming Du, Nikolai Spine, Lailiang Cheng, Yu Jiang

TL;DR

This paper tackles the challenge of joint 3D segmentation across panels, trees, and branches in orchards, where data scarcity and multi-stage pipelines hinder robotic deployment. It introduces L-TreeGen to generate realistic large-scale apple trees with configurable virtual sensing, and a Joint P2TB model that performs semantic, tree-instance, and branch-instance segmentation in one pass using a 3D sparse Unet encoder and a joint Transformer decoder. A Real2Sim Sim2Real loop enables zero-shot transfer to real-world data, with extensive ablation showing the impact of Virtual Laser Scanning fidelity and architectural refinements on accuracy and efficiency. The results demonstrate strong accuracy, reduced memory footprint, and practical potential for automated orchard operations and digital twins.

Abstract

Modern orchards are planted in structured rows with distinct panel divisions to improve management. Accurate and efficient joint segmentation of point cloud from Panel to Tree and Branch (P2TB) is essential for robotic operations. However, most current segmentation methods focus on single instance segmentation and depend on a sequence of deep networks to perform joint tasks. This strategy hinders the use of hierarchical information embedded in the data, leading to both error accumulation and increased costs for annotation and computation, which limits its scalability for real-world applications. In this study, we proposed a novel approach that incorporated a Real2Sim L-TreeGen for training data generation and a joint model (J-P2TB) designed for the P2TB task. The J-P2TB model, trained on the generated simulation dataset, was used for joint segmentation of real-world panel point clouds via zero-shot learning. Compared to representative methods, our model outperformed them in most segmentation metrics while using 40% fewer learnable parameters. This Sim2Real result highlighted the efficacy of L-TreeGen in model training and the performance of J-P2TB for joint segmentation, demonstrating its strong accuracy, efficiency, and generalizability for real-world applications. These improvements would not only greatly benefit the development of robots for automated orchard operations but also advance digital twin technology.

Joint 3D Point Cloud Segmentation using Real-Sim Loop: From Panels to Trees and Branches

TL;DR

This paper tackles the challenge of joint 3D segmentation across panels, trees, and branches in orchards, where data scarcity and multi-stage pipelines hinder robotic deployment. It introduces L-TreeGen to generate realistic large-scale apple trees with configurable virtual sensing, and a Joint P2TB model that performs semantic, tree-instance, and branch-instance segmentation in one pass using a 3D sparse Unet encoder and a joint Transformer decoder. A Real2Sim Sim2Real loop enables zero-shot transfer to real-world data, with extensive ablation showing the impact of Virtual Laser Scanning fidelity and architectural refinements on accuracy and efficiency. The results demonstrate strong accuracy, reduced memory footprint, and practical potential for automated orchard operations and digital twins.

Abstract

Modern orchards are planted in structured rows with distinct panel divisions to improve management. Accurate and efficient joint segmentation of point cloud from Panel to Tree and Branch (P2TB) is essential for robotic operations. However, most current segmentation methods focus on single instance segmentation and depend on a sequence of deep networks to perform joint tasks. This strategy hinders the use of hierarchical information embedded in the data, leading to both error accumulation and increased costs for annotation and computation, which limits its scalability for real-world applications. In this study, we proposed a novel approach that incorporated a Real2Sim L-TreeGen for training data generation and a joint model (J-P2TB) designed for the P2TB task. The J-P2TB model, trained on the generated simulation dataset, was used for joint segmentation of real-world panel point clouds via zero-shot learning. Compared to representative methods, our model outperformed them in most segmentation metrics while using 40% fewer learnable parameters. This Sim2Real result highlighted the efficacy of L-TreeGen in model training and the performance of J-P2TB for joint segmentation, demonstrating its strong accuracy, efficiency, and generalizability for real-world applications. These improvements would not only greatly benefit the development of robots for automated orchard operations but also advance digital twin technology.

Paper Structure

This paper contains 29 sections, 4 figures, 4 tables.

Figures (4)

  • Figure 1: The proposed approach that performs joint semantic, tree instance, and branch instance segmentation in one-run by leveraging the Real-Sim loop. 3D annotation of dense apple trees is extremely error-prone and time-consuming, especially with the incomplete data. Thanks to our L-TreeGen, our Sim2Real approach in a zero-shot setting outperformed existing Real2Real approaches, without the need of human annotation and multiple sequential networks.
  • Figure 2: Our network architecture consists of a 3D sparse Unet encoder and organ-specific transformer decoders. The panel point cloud is fed into a shared encoder that forwards learned s to separate decoders for joint prediction. The embeddings are refined by a small network to extract fine-grained features prior to the P2B decoder. The P2T instance-only decoder generates the tree instance, and the P2B decoder generates the semantic class $\in \{\text{trunk}, \text{branch}\}$ and branch instance by leveraging the tree instance prediction as a hierarchical prior.
  • Figure 3: A simulated orchard and a simulated panel representative processed by VLS with various scanner resolution. (1) Occlusion effects due to overlapping and (2) No-hit effects due to object size. We colorized the data in (b) based on the point density.
  • Figure 4: Qualitative segmentation results on a test panel from the PB1 approach (Real2Real) and ours (Sim2Real). The panoptic segmentation combines the semantic class (trunk/branch) and branch instance prediction. We highlighted the PB1 branch segmentation errors in black boxes and ours semantic segmentation errors in blue boxes.