Transformer-based assignment decision network for multiple object tracking
Athena Psalta, Vasileios Tsironis, Konstantinos Karantzalos
TL;DR
This paper introduces Transformer-based Assignment Decision Network (TADN) for data association in online tracking-by-detection MOT. TADN directly infers detections-to-target assignments in a single forward pass by producing an Assignment Score Matrix $ASM$ over $N$ detections and $M+1$ targets (including a $null$ target) and computing $A_{final}=\text{argmax}(ASM)$ row-wise, enabling end-to-end differentiable MOT with a simple tracking framework. The authors present two TADN architectures (single and dual-branch), a training strategy leveraging a Label Assignment Matrix (LAM) and a progressive predictor/teacher mix, and comprehensive experiments on MOT17, MOT20, and UA-DETRAC showing competitive MOT metrics and real-time association speeds (~10 Hz). Although the baseline tracker is simple and lacks re-identification and occlusion handling, the results demonstrate TADN's potential as a lightweight, transferable data association module. The work suggests promising avenues for integrating TADN into more sophisticated MOT systems and real-time applications, including embedded platforms.
Abstract
Data association is a crucial component for any multiple object tracking (MOT) method that follows the tracking-by-detection paradigm. To generate complete trajectories such methods employ a data association process to establish assignments between detections and existing targets during each timestep. Recent data association approaches try to solve either a multi-dimensional linear assignment task or a network flow minimization problem or tackle it via multiple hypotheses tracking. However, during inference an optimization step that computes optimal assignments is required for every sequence frame inducing additional complexity to any given solution. To this end, in the context of this work we introduce Transformer-based Assignment Decision Network (TADN) that tackles data association without the need of any explicit optimization during inference. In particular, TADN can directly infer assignment pairs between detections and active targets in a single forward pass of the network. We have integrated TADN in a rather simple MOT framework, designed a novel training strategy for efficient end-to-end training and demonstrated the high potential of our approach for online visual tracking-by-detection MOT on several popular benchmarks, i.e. MOT17, MOT20 and UA-DETRAC. Our proposed approach demonstrates strong performance in most evaluation metrics despite its simple nature as a tracker lacking significant auxiliary components such as occlusion handling or re-identification. The implementation of our method is publicly available at https://github.com/psaltaath/tadn-mot.
