Table of Contents
Fetching ...

AM-SORT: Adaptable Motion Predictor with Historical Trajectory Embedding for Multi-Object Tracking

Vitaliy Kim, Gunho Jung, Seong-Whan Lee

TL;DR

The paper tackles non-linear object motion and occlusions in multi-object tracking by replacing the Kalman Filter with a transformer-based adaptable motion predictor. It introduces historical trajectory embedding to encode spatio-temporal information from bounding box sequences and uses a prediction token to estimate the current frame bounding box. Training uses segments of length $T+1$ with an $\mathcal{L}_{pred}$ loss and masking augmentation with probability $p$, with $T=30$ in experiments. AM-SORT achieves competitive results on DanceTrack (e.g., 56.3 IDF1 and 55.6 HOTA), demonstrating improved association under non-linear motion while maintaining low computational load by relying solely on motion information from bounding boxes.

Abstract

Many multi-object tracking (MOT) approaches, which employ the Kalman Filter as a motion predictor, assume constant velocity and Gaussian-distributed filtering noises. These assumptions render the Kalman Filter-based trackers effective in linear motion scenarios. However, these linear assumptions serve as a key limitation when estimating future object locations within scenarios involving non-linear motion and occlusions. To address this issue, we propose a motion-based MOT approach with an adaptable motion predictor, called AM-SORT, which adapts to estimate non-linear uncertainties. AM-SORT is a novel extension of the SORT-series trackers that supersedes the Kalman Filter with the transformer architecture as a motion predictor. We introduce a historical trajectory embedding that empowers the transformer to extract spatio-temporal features from a sequence of bounding boxes. AM-SORT achieves competitive performance compared to state-of-the-art trackers on DanceTrack, with 56.3 IDF1 and 55.6 HOTA. We conduct extensive experiments to demonstrate the effectiveness of our method in predicting non-linear movement under occlusions.

AM-SORT: Adaptable Motion Predictor with Historical Trajectory Embedding for Multi-Object Tracking

TL;DR

The paper tackles non-linear object motion and occlusions in multi-object tracking by replacing the Kalman Filter with a transformer-based adaptable motion predictor. It introduces historical trajectory embedding to encode spatio-temporal information from bounding box sequences and uses a prediction token to estimate the current frame bounding box. Training uses segments of length with an loss and masking augmentation with probability , with in experiments. AM-SORT achieves competitive results on DanceTrack (e.g., 56.3 IDF1 and 55.6 HOTA), demonstrating improved association under non-linear motion while maintaining low computational load by relying solely on motion information from bounding boxes.

Abstract

Many multi-object tracking (MOT) approaches, which employ the Kalman Filter as a motion predictor, assume constant velocity and Gaussian-distributed filtering noises. These assumptions render the Kalman Filter-based trackers effective in linear motion scenarios. However, these linear assumptions serve as a key limitation when estimating future object locations within scenarios involving non-linear motion and occlusions. To address this issue, we propose a motion-based MOT approach with an adaptable motion predictor, called AM-SORT, which adapts to estimate non-linear uncertainties. AM-SORT is a novel extension of the SORT-series trackers that supersedes the Kalman Filter with the transformer architecture as a motion predictor. We introduce a historical trajectory embedding that empowers the transformer to extract spatio-temporal features from a sequence of bounding boxes. AM-SORT achieves competitive performance compared to state-of-the-art trackers on DanceTrack, with 56.3 IDF1 and 55.6 HOTA. We conduct extensive experiments to demonstrate the effectiveness of our method in predicting non-linear movement under occlusions.
Paper Structure (21 sections, 6 equations, 5 figures, 6 tables)

This paper contains 21 sections, 6 equations, 5 figures, 6 tables.

Figures (5)

  • Figure 1: Results on dancetrack0004 sequence from DanceTrack for (a) OC-SORT and (b) AM-SORT (Ours). The object, marked in yellow, moves to the left and becomes occluded in the middle frame. Then, the yellow object changes the movement direction to the right after occlusion, and OC-SORT does not capture this sudden directional shift, causing an ID-switch from 13 to 10.
  • Figure 2: Comparison of (a) conventional transformer-based MOT and (b) our frameworks. The key difference lies in the input feature level: typical transformer-based approaches take frames as input and primarily utilize appearance information, whereas AM-SORT processes bounding boxes and solely relies on motion information.
  • Figure 3: Illustration of the AM-SORT overall pipeline. The historical trajectory of length $T$ is fed into the transformer encoder to estimate the track predictions $\mathcal{P}_{t}$. Through utilizing an off-the-shelf detector, detections $\mathcal{D}_{t}$ are obtained. Subsequently, the Hungarian matching algorithm associates $\mathcal{D}_{t}$ with $\mathcal{P}_{t}$, resulting in the final output tracks.
  • Figure 4: Illustration of our historical trajectory embedding in the motion predictor. The historical trajectory embedding encodes a comprehensive representation of a bounding box sequence by jointly considering spatio-temporal information.
  • Figure 5: Qualitative comparison of OC-SORT and AM-SORT (Ours). The first row shows the tracking results in the scenario with non-linear changes of the bounding box for dancetrack0010 sequence; the second row in the scenario with non-linear object movement during occlusions for dancetrack0019.