DynamicTrack: Advancing Gigapixel Tracking in Crowded Scenes
Yunqi Zhao, Yuchen Guo, Zheng Cao, Kai Ni, Ruqi Huang, Lu Fang
TL;DR
DynamicTrack tackles gigapixel crowded-scene tracking under severe occlusion by introducing a head–body joint detector trained with contrastive (Associative Embedding) learning and a cascade-based dynamic association that fuses head and body cues. The approach integrates an embedding branch into a Faster-RCNN backbone and optimizes head–body correspondences with $L_{pull}$ and $L_{push}$ losses, achieving robust matching through cascade association. Empirically, DynamicTrack delivers state-of-the-art results on MOT20 and PANDA among two-stage trackers, with ablations confirming substantial gains from head cues and head–body joint optimization. The work advances practical gigapixel tracking in crowded scenes, with implications for surveillance and pedestrian analysis, and points to future integration with transformer-based detectors for further improvements.
Abstract
Tracking in gigapixel scenarios holds numerous potential applications in video surveillance and pedestrian analysis. Existing algorithms attempt to perform tracking in crowded scenes by utilizing multiple cameras or group relationships. However, their performance significantly degrades when confronted with complex interaction and occlusion inherent in gigapixel images. In this paper, we introduce DynamicTrack, a dynamic tracking framework designed to address gigapixel tracking challenges in crowded scenes. In particular, we propose a dynamic detector that utilizes contrastive learning to jointly detect the head and body of pedestrians. Building upon this, we design a dynamic association algorithm that effectively utilizes head and body information for matching purposes. Extensive experiments show that our tracker achieves state-of-the-art performance on widely used tracking benchmarks specifically designed for gigapixel crowded scenes.
