Delving into the Trajectory Long-tail Distribution for Muti-object Tracking
Sijia Chen, En Yu, Jinyang Li, Wenbing Tao
TL;DR
This work identifies a pronounced long-tail distribution in MOT trajectory lengths that biases tail-class learning. It introduces two viewpoint-aware data augmentations, Stationary Camera View Data Augmentation (SVA) and Dynamic Camera View Data Augmentation (DVA), along with a Group Softmax (GS) module to balance Re-ID across pedestrian classes with differing sample counts. The method, validated on MOT15/16/17/20 with two SOTA trackers, yields consistent improvements in MOTA, IDF1, and HOTA and demonstrates data-efficiency benefits when training data is limited. By addressing both information augmentation and classifier balance, the approach enhances tail-class recognition and robust trajectory association, particularly in crowded scenes where long tails are most severe.
Abstract
Multiple Object Tracking (MOT) is a critical area within computer vision, with a broad spectrum of practical implementations. Current research has primarily focused on the development of tracking algorithms and enhancement of post-processing techniques. Yet, there has been a lack of thorough examination concerning the nature of tracking data it self. In this study, we pioneer an exploration into the distribution patterns of tracking data and identify a pronounced long-tail distribution issue within existing MOT datasets. We note a significant imbalance in the distribution of trajectory lengths across different pedestrians, a phenomenon we refer to as ``pedestrians trajectory long-tail distribution''. Addressing this challenge, we introduce a bespoke strategy designed to mitigate the effects of this skewed distribution. Specifically, we propose two data augmentation strategies, including Stationary Camera View Data Augmentation (SVA) and Dynamic Camera View Data Augmentation (DVA) , designed for viewpoint states and the Group Softmax (GS) module for Re-ID. SVA is to backtrack and predict the pedestrian trajectory of tail classes, and DVA is to use diffusion model to change the background of the scene. GS divides the pedestrians into unrelated groups and performs softmax operation on each group individually. Our proposed strategies can be integrated into numerous existing tracking systems, and extensive experimentation validates the efficacy of our method in reducing the influence of long-tail distribution on multi-object tracking performance. The code is available at https://github.com/chen-si-jia/Trajectory-Long-tail-Distribution-for-MOT.
