Table of Contents
Fetching ...

A comparison between single-stage and two-stage 3D tracking algorithms for greenhouse robotics

David Rapado-Rincon, Akshay K. Burusa, Eldert J. van Henten, Gert Kootstra

TL;DR

This paper tackles robust 3D multi-object tracking for greenhouse robotics under occlusions, comparing two-stage 3D-SORT with single-stage MOT-DETR on a real tomato dataset. The authors use a 3D MOT framework with color images and point clouds, evaluating detection quality and association under various viewpoint sequences, including active perception. They find that single-stage MOT-DETR yields better overall tracking accuracy and data association, particularly as occlusion increases, while two-stage 3D-SORT benefits from stronger detectors. Active perception further boosts performance for both methods, highlighting practical considerations for deploying greenhouse robotic systems.

Abstract

With the current demand for automation in the agro-food industry, accurately detecting and localizing relevant objects in 3D is essential for successful robotic operations. However, this is a challenge due the presence of occlusions. Multi-view perception approaches allow robots to overcome occlusions, but a tracking component is needed to associate the objects detected by the robot over multiple viewpoints. Multi-object tracking (MOT) algorithms can be categorized between two-stage and single-stage methods. Two-stage methods tend to be simpler to adapt and implement to custom applications, while single-stage methods present a more complex end-to-end tracking method that can yield better results in occluded situations at the cost of more training data. The potential advantages of single-stage methods over two-stage methods depends on the complexity of the sequence of viewpoints that a robot needs to process. In this work, we compare a 3D two-stage MOT algorithm, 3D-SORT, against a 3D single-stage MOT algorithm, MOT-DETR, in three different types of sequences with varying levels of complexity. The sequences represent simpler and more complex motions that a robot arm can perform in a tomato greenhouse. Our experiments in a tomato greenhouse show that the single-stage algorithm consistently yields better tracking accuracy, especially in the more challenging sequences where objects are fully occluded or non-visible during several viewpoints.

A comparison between single-stage and two-stage 3D tracking algorithms for greenhouse robotics

TL;DR

This paper tackles robust 3D multi-object tracking for greenhouse robotics under occlusions, comparing two-stage 3D-SORT with single-stage MOT-DETR on a real tomato dataset. The authors use a 3D MOT framework with color images and point clouds, evaluating detection quality and association under various viewpoint sequences, including active perception. They find that single-stage MOT-DETR yields better overall tracking accuracy and data association, particularly as occlusion increases, while two-stage 3D-SORT benefits from stronger detectors. Active perception further boosts performance for both methods, highlighting practical considerations for deploying greenhouse robotic systems.

Abstract

With the current demand for automation in the agro-food industry, accurately detecting and localizing relevant objects in 3D is essential for successful robotic operations. However, this is a challenge due the presence of occlusions. Multi-view perception approaches allow robots to overcome occlusions, but a tracking component is needed to associate the objects detected by the robot over multiple viewpoints. Multi-object tracking (MOT) algorithms can be categorized between two-stage and single-stage methods. Two-stage methods tend to be simpler to adapt and implement to custom applications, while single-stage methods present a more complex end-to-end tracking method that can yield better results in occluded situations at the cost of more training data. The potential advantages of single-stage methods over two-stage methods depends on the complexity of the sequence of viewpoints that a robot needs to process. In this work, we compare a 3D two-stage MOT algorithm, 3D-SORT, against a 3D single-stage MOT algorithm, MOT-DETR, in three different types of sequences with varying levels of complexity. The sequences represent simpler and more complex motions that a robot arm can perform in a tomato greenhouse. Our experiments in a tomato greenhouse show that the single-stage algorithm consistently yields better tracking accuracy, especially in the more challenging sequences where objects are fully occluded or non-visible during several viewpoints.
Paper Structure (7 sections, 3 figures, 2 tables)

This paper contains 7 sections, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Robotic system used for data collection. The robot arm ABB IRB1200 is mounted over a mobile platform that allows motion over the greenhouse row rails. A scissor-like cutting and gripping tool, and a Realsense L515 camera are mounted on the end-effector of the robot.
  • Figure 2: Left. Illustration of the path followed by the robot to collect viewpoints of real plants. Right. Example of a viewpoint on a plant.
  • Figure 3: 3D-SORT (top). First, the color image is processed by the object detection algorithm. The resulting detections are used together with the point cloud to generate a 3D position per detected object, which corresponds to the re-ID property used by the data association step. The Hungarian algorithm is then used to associate the locations of newly detected object with the previously tracked object positions. MOT-DETR (bottom). Color images and point clouds are used at the same time to detect objects with their corresponding class and re-ID features, which are black box features. The re-ID features are then passed to a Hungarian-based data association algorithm.