A comparison between single-stage and two-stage 3D tracking algorithms for greenhouse robotics
David Rapado-Rincon, Akshay K. Burusa, Eldert J. van Henten, Gert Kootstra
TL;DR
This paper tackles robust 3D multi-object tracking for greenhouse robotics under occlusions, comparing two-stage 3D-SORT with single-stage MOT-DETR on a real tomato dataset. The authors use a 3D MOT framework with color images and point clouds, evaluating detection quality and association under various viewpoint sequences, including active perception. They find that single-stage MOT-DETR yields better overall tracking accuracy and data association, particularly as occlusion increases, while two-stage 3D-SORT benefits from stronger detectors. Active perception further boosts performance for both methods, highlighting practical considerations for deploying greenhouse robotic systems.
Abstract
With the current demand for automation in the agro-food industry, accurately detecting and localizing relevant objects in 3D is essential for successful robotic operations. However, this is a challenge due the presence of occlusions. Multi-view perception approaches allow robots to overcome occlusions, but a tracking component is needed to associate the objects detected by the robot over multiple viewpoints. Multi-object tracking (MOT) algorithms can be categorized between two-stage and single-stage methods. Two-stage methods tend to be simpler to adapt and implement to custom applications, while single-stage methods present a more complex end-to-end tracking method that can yield better results in occluded situations at the cost of more training data. The potential advantages of single-stage methods over two-stage methods depends on the complexity of the sequence of viewpoints that a robot needs to process. In this work, we compare a 3D two-stage MOT algorithm, 3D-SORT, against a 3D single-stage MOT algorithm, MOT-DETR, in three different types of sequences with varying levels of complexity. The sequences represent simpler and more complex motions that a robot arm can perform in a tomato greenhouse. Our experiments in a tomato greenhouse show that the single-stage algorithm consistently yields better tracking accuracy, especially in the more challenging sequences where objects are fully occluded or non-visible during several viewpoints.
