Table of Contents
Fetching ...

FusionSORT: Fusion Methods for Online Multi-object Visual Tracking

Nathanael L. Baisa

TL;DR

FusionSORT addresses data association in online multi-object tracking by systematically comparing four cue-fusion strategies that combine strong cues (motion and appearance) with weak cues (height-IoU and tracklet confidence). The method uses a KF-based tracker with camera-motion compensation and a two-stage association, with appearance used in the first stage and IoU in the second. The study extends the KF state to include tracklet confidence and evaluates the fusion methods on MOT17, MOT20, and DanceTrack, revealing substantial performance differences across strategies and datasets. The findings guide practitioners toward the weighted sum fusion (IoU-based) when weak cues are available, and highlight the importance of selecting fusion schemes according to cue availability and scene characteristics.

Abstract

In this work, we investigate four different fusion methods for associating detections to tracklets in multi-object visual tracking. In addition to considering strong cues such as motion and appearance information, we also consider weak cues such as height intersection-over-union (height-IoU) and tracklet confidence information in the data association using different fusion methods. These fusion methods include minimum, weighted sum based on IoU, Kalman filter (KF) gating, and hadamard product of costs due to the different cues. We conduct extensive evaluations on validation sets of MOT17, MOT20 and DanceTrack datasets, and find out that the choice of a fusion method is key for data association in multi-object visual tracking. We hope that this investigative work helps the computer vision research community to use the right fusion method for data association in multi-object visual tracking.

FusionSORT: Fusion Methods for Online Multi-object Visual Tracking

TL;DR

FusionSORT addresses data association in online multi-object tracking by systematically comparing four cue-fusion strategies that combine strong cues (motion and appearance) with weak cues (height-IoU and tracklet confidence). The method uses a KF-based tracker with camera-motion compensation and a two-stage association, with appearance used in the first stage and IoU in the second. The study extends the KF state to include tracklet confidence and evaluates the fusion methods on MOT17, MOT20, and DanceTrack, revealing substantial performance differences across strategies and datasets. The findings guide practitioners toward the weighted sum fusion (IoU-based) when weak cues are available, and highlight the importance of selecting fusion schemes according to cue availability and scene characteristics.

Abstract

In this work, we investigate four different fusion methods for associating detections to tracklets in multi-object visual tracking. In addition to considering strong cues such as motion and appearance information, we also consider weak cues such as height intersection-over-union (height-IoU) and tracklet confidence information in the data association using different fusion methods. These fusion methods include minimum, weighted sum based on IoU, Kalman filter (KF) gating, and hadamard product of costs due to the different cues. We conduct extensive evaluations on validation sets of MOT17, MOT20 and DanceTrack datasets, and find out that the choice of a fusion method is key for data association in multi-object visual tracking. We hope that this investigative work helps the computer vision research community to use the right fusion method for data association in multi-object visual tracking.
Paper Structure (22 sections, 20 equations, 4 tables)