SelfMOTR: Revisiting MOTR with Self-Generating Detection Priors
Fabian Gülhan, Emil Mededovic, Yuli Wu, Johannes Stegmaier
TL;DR
SelfMOTR addresses the detection–association conflict in end-to-end transformer MOT by extracting and reusing the model's own detection signal as internally generated priors. It adds a detection-only forward pass that produces self proposals, which are then fed together with track queries into a shared decoder, keeping the pipeline detector-free and end-to-end. This decoupling stabilizes detection and strengthens association, achieving competitive results on DanceTrack (e.g., $ ext{HOTA}=69.2$, $ ext{IDF1}=72.5$) and MOT17 while avoiding external detectors. The findings suggest that internal capacity of end-to-end transformers can be allocated to detection and association more effectively, unlocking improvements without added detector modules and motivating integration into stronger query-based trackers.
Abstract
Despite progress toward end-to-end tracking with transformer architectures, poor detection performance and the conflict between detection and association in a joint architecture remain critical concerns. Recent approaches aim to mitigate these issues by (i) employing advanced denoising or label assignment strategies, or (ii) incorporating detection priors from external object detectors via distillation or anchor proposal techniques. Inspired by the success of integrating detection priors and by the key insight that MOTR-like models are secretly strong detection models, we introduce SelfMOTR, a novel tracking transformer that relies on self-generated detection priors. Through extensive analysis and ablation studies, we uncover and demonstrate the hidden detection capabilities of MOTR-like models, and present a practical set of tools for leveraging them effectively. On DanceTrack, SelfMOTR achieves strong performance, competing with recent state-of-the-art end-to-end tracking methods.
