Table of Contents
Fetching ...

AgriSORT: A Simple Online Real-time Tracking-by-Detection framework for robotics in precision agriculture

Leonardo Saraceni, Ionut M. Motoi, Daniele Nardi, Thomas A. Ciarfuglia

TL;DR

AgriSORT is proposed, a simple, online, real-time tracking-by-detection pipeline for precision agriculture based only on motion information that allows for accurate and fast propagation of tracks between frames.

Abstract

The problem of multi-object tracking (MOT) consists in detecting and tracking all the objects in a video sequence while keeping a unique identifier for each object. It is a challenging and fundamental problem for robotics. In precision agriculture the challenge of achieving a satisfactory solution is amplified by extreme camera motion, sudden illumination changes, and strong occlusions. Most modern trackers rely on the appearance of objects rather than motion for association, which can be ineffective when most targets are static objects with the same appearance, as in the agricultural case. To this end, on the trail of SORT [5], we propose AgriSORT, a simple, online, real-time tracking-by-detection pipeline for precision agriculture based only on motion information that allows for accurate and fast propagation of tracks between frames. The main focuses of AgriSORT are efficiency, flexibility, minimal dependencies, and ease of deployment on robotic platforms. We test the proposed pipeline on a novel MOT benchmark specifically tailored for the agricultural context, based on video sequences taken in a table grape vineyard, particularly challenging due to strong self-similarity and density of the instances. Both the code and the dataset are available for future comparisons.

AgriSORT: A Simple Online Real-time Tracking-by-Detection framework for robotics in precision agriculture

TL;DR

AgriSORT is proposed, a simple, online, real-time tracking-by-detection pipeline for precision agriculture based only on motion information that allows for accurate and fast propagation of tracks between frames.

Abstract

The problem of multi-object tracking (MOT) consists in detecting and tracking all the objects in a video sequence while keeping a unique identifier for each object. It is a challenging and fundamental problem for robotics. In precision agriculture the challenge of achieving a satisfactory solution is amplified by extreme camera motion, sudden illumination changes, and strong occlusions. Most modern trackers rely on the appearance of objects rather than motion for association, which can be ineffective when most targets are static objects with the same appearance, as in the agricultural case. To this end, on the trail of SORT [5], we propose AgriSORT, a simple, online, real-time tracking-by-detection pipeline for precision agriculture based only on motion information that allows for accurate and fast propagation of tracks between frames. The main focuses of AgriSORT are efficiency, flexibility, minimal dependencies, and ease of deployment on robotic platforms. We test the proposed pipeline on a novel MOT benchmark specifically tailored for the agricultural context, based on video sequences taken in a table grape vineyard, particularly challenging due to strong self-similarity and density of the instances. Both the code and the dataset are available for future comparisons.
Paper Structure (14 sections, 5 equations, 5 figures, 3 tables)

This paper contains 14 sections, 5 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Agricultural robotic platform used in the EU Project CANOPIES for operations in table grape vineyards, equipped with an Intel RealSense d435i camera on the wrist. AgriSORT is implemented on this robot, and is used for tracking grapes.
  • Figure 2: Overview of our AgriSORT tracker pipeline. The process starts with the estimation of the relative camera motion. At first we extract features using the Shi-Tomasi method in the previous and current frame to compute the Optical Flow via the Lucas-Kanade algorithm. Using the generated matches we estimate an affine transform that expresses the motion of the camera. The estimated matrix is then used to propagate the state of the previous tracks in the current frame. At the end we compute the IoU with the current detection to perform the association and update the state of the tracklets.
  • Figure 3: Experimental setup for data acquisition, a Intel RealSense d435i camera mounted on a tripod for stabilization during movements.
  • Figure 4: Example of difficult cases present in the experimental field. Strong frontal illumination and motion blur due to fast motion (a), and occlusions due to leafs and branches (b).
  • Figure 5: Qualitative evaluation of the performance of Agrisort and other SOTA trackers in the "CloseUp1" sequence at frame 20 (top row) and 80 (bottom row). AgriSORT (a) is able to keep consistency on most tracks, without switching IDs and providing the least number of false negatives. SORT (b) suffers in this sequence, it loses almost all the IDs, due to the presence of fast non-linear motion and illumination changes. OC-SORT (c) and BYTE (d) perform better than SORT, in particular without switching IDS, however their performance is far from AgriSORT because they lose some of the tracks both in foreground and background.