One Graph to Track Them All: Dynamic GNNs for Single- and Multi-View Tracking
Martin Engilberge, Ivan Vrkic, Friedrich Wilke Grosche, Julien Pilet, Engin Turetken, Pascal Fua
TL;DR
The paper tackles online multi-people tracking under occlusion by unifying single- and multi-view scenarios within a dynamic spatiotemporal graph. A Unified Message Passing Network (UMPN) updates edge and vertex representations, assigns probabilities to potential connections, and extracts trajectories in an online fashion, optionally leveraging scene priors through camera vertices. The approach achieves state-of-the-art performance on WILDTRACK and MOT benchmarks and introduces SCOUT, a large-scale 25-view dataset with detailed scene reconstructions to better study occlusions and scene context. This framework advances practical surveillance and monitoring by enabling robust, end-to-end reasoning over time, views, and scene geometry while providing public dataset and code releases.
Abstract
This work presents a unified, fully differentiable model for multi-people tracking that learns to associate detections into trajectories without relying on pre-computed tracklets. The model builds a dynamic spatiotemporal graph that aggregates spatial, contextual, and temporal information, enabling seamless information propagation across entire sequences. To improve occlusion handling, the graph can also encode scene-specific information. We also introduce a new large-scale dataset with 25 partially overlapping views, detailed scene reconstructions, and extensive occlusions. Experiments show the model achieves state-of-the-art performance on public benchmarks and the new dataset, with flexibility across diverse conditions. Both the dataset and approach will be publicly released to advance research in multi-people tracking.
