OmniTrack++: Omnidirectional Multi-Object Tracking by Learning Large-FoV Trajectory Feedback
Kai Luo, Hao Shi, Kunyu Peng, Fei Teng, Sheng Wu, Kaiwei Wang, Kailun Yang
TL;DR
OmniTrack++ tackles multi-object tracking in 360° panoramic imagery by unifying End-To-End and Tracking-By-Detection within a trajectory-feedback loop. It introduces four interdependent components—DynamicSSM Block for distortion-robust features, FlexiTrack Instances for short-term trajectory guidance, ExpertTrack Memory for long-term identity modeling via a Shared Mixture-of-Experts, and Tracklet Management for adaptive paradigm switching. The EmboTrack benchmark (QuadTrack and BipTrack) provides a challenging dataset to evaluate panoramic MOT in embodied robotics, and extensive experiments show state-of-the-art performance with substantial gains in HOTA and IDF1 over baselines. The work demonstrates strong robustness to egocentric motion and panoramic distortions, enabling practical panoramic perception for mobile robots and future long-term tracking in dynamic environments.
Abstract
This paper investigates Multi-Object Tracking (MOT) in panoramic imagery, which introduces unique challenges including a 360° Field of View (FoV), resolution dilution, and severe view-dependent distortions. Conventional MOT methods designed for narrow-FoV pinhole cameras generalize unsatisfactorily under these conditions. To address panoramic distortion, large search space, and identity ambiguity under a 360° FoV, OmniTrack++ adopts a feedback-driven framework that progressively refines perception with trajectory cues. A DynamicSSM block first stabilizes panoramic features, implicitly alleviating geometric distortion. On top of normalized representations, FlexiTrack Instances use trajectory-informed feedback for flexible localization and reliable short-term association. To ensure long-term robustness, an ExpertTrack Memory consolidates appearance cues via a Mixture-of-Experts design, enabling recovery from fragmented tracks and reducing identity drift. Finally, a Tracklet Management module adaptively switches between end-to-end and tracking-by-detection modes according to scene dynamics, offering a balanced and scalable solution for panoramic MOT. To support rigorous evaluation, we establish the EmboTrack benchmark, a comprehensive dataset for panoramic MOT that includes QuadTrack, captured with a quadruped robot, and BipTrack, collected with a bipedal wheel-legged robot. Together, these datasets span wide-angle environments and diverse motion patterns, providing a challenging testbed for real-world panoramic perception. Extensive experiments on JRDB and EmboTrack demonstrate that OmniTrack++ achieves state-of-the-art performance, yielding substantial HOTA improvements of +25.5% on JRDB and +43.07% on QuadTrack over the original OmniTrack. Datasets and code will be made publicly available at https://github.com/xifen523/OmniTrack.
