AM-SORT: Adaptable Motion Predictor with Historical Trajectory Embedding for Multi-Object Tracking
Vitaliy Kim, Gunho Jung, Seong-Whan Lee
TL;DR
The paper tackles non-linear object motion and occlusions in multi-object tracking by replacing the Kalman Filter with a transformer-based adaptable motion predictor. It introduces historical trajectory embedding to encode spatio-temporal information from bounding box sequences and uses a prediction token to estimate the current frame bounding box. Training uses segments of length $T+1$ with an $\mathcal{L}_{pred}$ loss and masking augmentation with probability $p$, with $T=30$ in experiments. AM-SORT achieves competitive results on DanceTrack (e.g., 56.3 IDF1 and 55.6 HOTA), demonstrating improved association under non-linear motion while maintaining low computational load by relying solely on motion information from bounding boxes.
Abstract
Many multi-object tracking (MOT) approaches, which employ the Kalman Filter as a motion predictor, assume constant velocity and Gaussian-distributed filtering noises. These assumptions render the Kalman Filter-based trackers effective in linear motion scenarios. However, these linear assumptions serve as a key limitation when estimating future object locations within scenarios involving non-linear motion and occlusions. To address this issue, we propose a motion-based MOT approach with an adaptable motion predictor, called AM-SORT, which adapts to estimate non-linear uncertainties. AM-SORT is a novel extension of the SORT-series trackers that supersedes the Kalman Filter with the transformer architecture as a motion predictor. We introduce a historical trajectory embedding that empowers the transformer to extract spatio-temporal features from a sequence of bounding boxes. AM-SORT achieves competitive performance compared to state-of-the-art trackers on DanceTrack, with 56.3 IDF1 and 55.6 HOTA. We conduct extensive experiments to demonstrate the effectiveness of our method in predicting non-linear movement under occlusions.
