Ego-Motion Aware Target Prediction Module for Robust Multi-Object Tracking
Navid Mahdian, Mohammad Jani, Amir M. Soufi Enayati, Homayoun Najjaran
TL;DR
This work tackles robust multi-object tracking under ego-motion by reformulating Kalman Filter-based predictions to decouple ego-vehicle motion from target motion. The Ego-motion Aware Target Prediction (EMAP) module integrates camera motion projections and depth maps into the KF state, isolating rotational and translational ego-motion effects via two motion cues. EMAP, when added to four state-of-the-art SORT-based trackers, yields substantial reductions in identity switches and improvements in HOTA on KITTI and CARLA datasets, especially in scenarios with strong ego-motion. The approach offers practical gains for autonomous driving, enabling more reliable tracking when detections are intermittent or camera motion is significant, and points to RGB-only ego-motion estimation as a future extension.
Abstract
Multi-object tracking (MOT) is a prominent task in computer vision with application in autonomous driving, responsible for the simultaneous tracking of multiple object trajectories. Detection-based multi-object tracking (DBT) algorithms detect objects using an independent object detector and predict the imminent location of each target. Conventional prediction methods in DBT utilize Kalman Filter(KF) to extrapolate the target location in the upcoming frames by supposing a constant velocity motion model. These methods are especially hindered in autonomous driving applications due to dramatic camera motion or unavailable detections. Such limitations lead to tracking failures manifested by numerous identity switches and disrupted trajectories. In this paper, we introduce a novel KF-based prediction module called the Ego-motion Aware Target Prediction (EMAP) module by focusing on the integration of camera motion and depth information with object motion models. Our proposed method decouples the impact of camera rotational and translational velocity from the object trajectories by reformulating the Kalman Filter. This reformulation enables us to reject the disturbances caused by camera motion and maximizes the reliability of the object motion model. We integrate our module with four state-of-the-art base MOT algorithms, namely OC-SORT, Deep OC-SORT, ByteTrack, and BoT-SORT. In particular, our evaluation on the KITTI MOT dataset demonstrates that EMAP remarkably drops the number of identity switches (IDSW) of OC-SORT and Deep OC-SORT by 73% and 21%, respectively. At the same time, it elevates other performance metrics such as HOTA by more than 5%. Our source code is available at https://github.com/noyzzz/EMAP.
