UniTrack: Differentiable Graph Representation Learning for Multi-Object Tracking
Bishoy Galoaa, Xiangyu Bai, Utsav Nandi, Sai Siddhartha Vivek Dhir Rangoju, Somaieh Amraee, Sarah Ostadabbas
TL;DR
UniTrack addresses persistent identity maintenance in multi-object tracking by introducing a differentiable graph-theoretic loss that jointly optimizes detection accuracy, identity preservation, and spatiotemporal coherence. Casting MOT as a sliding-window flow optimization over a graph with balance variables and flow conservation, it introduces adaptive Laplacian-based weighting to balance spatial and temporal terms. The universal loss can be plugged into existing MOT systems without architectural changes, delivering consistent improvements across TrackFormer, MOTR, FairMOT, ByteTrack, GTR, and MOTE on MOT17, MOT20, SportsMOT, and DanceTrack, including substantial reductions in ID switches and gains in IDF1/HOTA. The work provides theoretical convergence guarantees and analyzes frame-rate robustness, while noting training-time overhead (~5% memory) and a current single-camera focus as future directions toward multi-camera tracking.
Abstract
We present UniTrack, a plug-and-play graph-theoretic loss function designed to significantly enhance multi-object tracking (MOT) performance by directly optimizing tracking-specific objectives through unified differentiable learning. Unlike prior graph-based MOT methods that redesign tracking architectures, UniTrack provides a universal training objective that integrates detection accuracy, identity preservation, and spatiotemporal consistency into a single end-to-end trainable loss function, enabling seamless integration with existing MOT systems without architectural modifications. Through differentiable graph representation learning, UniTrack enables networks to learn holistic representations of motion continuity and identity relationships across frames. We validate UniTrack across diverse tracking models and multiple challenging benchmarks, demonstrating consistent improvements across all tested architectures and datasets including Trackformer, MOTR, FairMOT, ByteTrack, GTR, and MOTE. Extensive evaluations show up to 53\% reduction in identity switches and 12\% IDF1 improvements across challenging benchmarks, with GTR achieving peak performance gains of 9.7\% MOTA on SportsMOT.
