Table of Contents
Fetching ...

When to Extract ReID Features: A Selective Approach for Improved Multiple Object Tracking

Emirhan Bayar, Cemal Aker

TL;DR

This paper investigates a selective approach to minimize the overhead of feature extraction while preserving accuracy, modularity, and ease of implementation and demonstrates its effectiveness by applying it to StrongSORT and Deep OC-SORT.

Abstract

Extracting and matching Re-Identification (ReID) features is used by many state-of-the-art (SOTA) Multiple Object Tracking (MOT) methods, particularly effective against frequent and long-term occlusions. While end-to-end object detection and tracking have been the main focus of recent research, they have yet to outperform traditional methods in benchmarks like MOT17 and MOT20. Thus, from an application standpoint, methods with separate detection and embedding remain the best option for accuracy, modularity, and ease of implementation, though they are impractical for edge devices due to the overhead involved. In this paper, we investigate a selective approach to minimize the overhead of feature extraction while preserving accuracy, modularity, and ease of implementation. This approach can be integrated into various SOTA methods. We demonstrate its effectiveness by applying it to StrongSORT and Deep OC-SORT. Experiments on MOT17, MOT20, and DanceTrack datasets show that our mechanism retains the advantages of feature extraction during occlusions while significantly reducing runtime. Additionally, it improves accuracy by preventing confusion in the feature-matching stage, particularly in cases of deformation and appearance similarity, which are common in DanceTrack. https://github.com/emirhanbayar/Fast-StrongSORT, https://github.com/emirhanbayar/Fast-Deep-OC-SORT

When to Extract ReID Features: A Selective Approach for Improved Multiple Object Tracking

TL;DR

This paper investigates a selective approach to minimize the overhead of feature extraction while preserving accuracy, modularity, and ease of implementation and demonstrates its effectiveness by applying it to StrongSORT and Deep OC-SORT.

Abstract

Extracting and matching Re-Identification (ReID) features is used by many state-of-the-art (SOTA) Multiple Object Tracking (MOT) methods, particularly effective against frequent and long-term occlusions. While end-to-end object detection and tracking have been the main focus of recent research, they have yet to outperform traditional methods in benchmarks like MOT17 and MOT20. Thus, from an application standpoint, methods with separate detection and embedding remain the best option for accuracy, modularity, and ease of implementation, though they are impractical for edge devices due to the overhead involved. In this paper, we investigate a selective approach to minimize the overhead of feature extraction while preserving accuracy, modularity, and ease of implementation. This approach can be integrated into various SOTA methods. We demonstrate its effectiveness by applying it to StrongSORT and Deep OC-SORT. Experiments on MOT17, MOT20, and DanceTrack datasets show that our mechanism retains the advantages of feature extraction during occlusions while significantly reducing runtime. Additionally, it improves accuracy by preventing confusion in the feature-matching stage, particularly in cases of deformation and appearance similarity, which are common in DanceTrack. https://github.com/emirhanbayar/Fast-StrongSORT, https://github.com/emirhanbayar/Fast-Deep-OC-SORT
Paper Structure (19 sections, 5 equations, 8 figures, 4 tables)

This paper contains 19 sections, 5 equations, 8 figures, 4 tables.

Figures (8)

  • Figure 1: In this scene, the green boxes are the bounding boxes of the tracklets. The blue boxes are the detections with no occlusion risk. The yellow boxes are the detections with occlusion risk. The starting point of the proposed mechanism is: Why do we need to extract features for the blue boxes while matching them via positional information is not risky?
  • Figure 2: Aspect ratio similarity vs $\alpha$ for different IoU values.
  • Figure 3: HOTA scores on MOT17 and DanceTrack Validation sets for all parameters that are introduced
  • Figure 4: Bar plot of share of runtime for different IoU thresholds alongside the change in FPS and HOTA on MOT17 and DanceTrack
  • Figure 5: Some cases where erroneous candidacy of a tracklet is caught and thresholded by aspect ratio similarity. Blue boxes are detections, red boxes are tracklets and the green boxes are the tracklets that were decided to be candidates for multiple or wrong detections, but avoided by ARS thresholding.
  • ...and 3 more figures