Table of Contents
Fetching ...

Ego-Motion Aware Target Prediction Module for Robust Multi-Object Tracking

Navid Mahdian, Mohammad Jani, Amir M. Soufi Enayati, Homayoun Najjaran

TL;DR

This work tackles robust multi-object tracking under ego-motion by reformulating Kalman Filter-based predictions to decouple ego-vehicle motion from target motion. The Ego-motion Aware Target Prediction (EMAP) module integrates camera motion projections and depth maps into the KF state, isolating rotational and translational ego-motion effects via two motion cues. EMAP, when added to four state-of-the-art SORT-based trackers, yields substantial reductions in identity switches and improvements in HOTA on KITTI and CARLA datasets, especially in scenarios with strong ego-motion. The approach offers practical gains for autonomous driving, enabling more reliable tracking when detections are intermittent or camera motion is significant, and points to RGB-only ego-motion estimation as a future extension.

Abstract

Multi-object tracking (MOT) is a prominent task in computer vision with application in autonomous driving, responsible for the simultaneous tracking of multiple object trajectories. Detection-based multi-object tracking (DBT) algorithms detect objects using an independent object detector and predict the imminent location of each target. Conventional prediction methods in DBT utilize Kalman Filter(KF) to extrapolate the target location in the upcoming frames by supposing a constant velocity motion model. These methods are especially hindered in autonomous driving applications due to dramatic camera motion or unavailable detections. Such limitations lead to tracking failures manifested by numerous identity switches and disrupted trajectories. In this paper, we introduce a novel KF-based prediction module called the Ego-motion Aware Target Prediction (EMAP) module by focusing on the integration of camera motion and depth information with object motion models. Our proposed method decouples the impact of camera rotational and translational velocity from the object trajectories by reformulating the Kalman Filter. This reformulation enables us to reject the disturbances caused by camera motion and maximizes the reliability of the object motion model. We integrate our module with four state-of-the-art base MOT algorithms, namely OC-SORT, Deep OC-SORT, ByteTrack, and BoT-SORT. In particular, our evaluation on the KITTI MOT dataset demonstrates that EMAP remarkably drops the number of identity switches (IDSW) of OC-SORT and Deep OC-SORT by 73% and 21%, respectively. At the same time, it elevates other performance metrics such as HOTA by more than 5%. Our source code is available at https://github.com/noyzzz/EMAP.

Ego-Motion Aware Target Prediction Module for Robust Multi-Object Tracking

TL;DR

This work tackles robust multi-object tracking under ego-motion by reformulating Kalman Filter-based predictions to decouple ego-vehicle motion from target motion. The Ego-motion Aware Target Prediction (EMAP) module integrates camera motion projections and depth maps into the KF state, isolating rotational and translational ego-motion effects via two motion cues. EMAP, when added to four state-of-the-art SORT-based trackers, yields substantial reductions in identity switches and improvements in HOTA on KITTI and CARLA datasets, especially in scenarios with strong ego-motion. The approach offers practical gains for autonomous driving, enabling more reliable tracking when detections are intermittent or camera motion is significant, and points to RGB-only ego-motion estimation as a future extension.

Abstract

Multi-object tracking (MOT) is a prominent task in computer vision with application in autonomous driving, responsible for the simultaneous tracking of multiple object trajectories. Detection-based multi-object tracking (DBT) algorithms detect objects using an independent object detector and predict the imminent location of each target. Conventional prediction methods in DBT utilize Kalman Filter(KF) to extrapolate the target location in the upcoming frames by supposing a constant velocity motion model. These methods are especially hindered in autonomous driving applications due to dramatic camera motion or unavailable detections. Such limitations lead to tracking failures manifested by numerous identity switches and disrupted trajectories. In this paper, we introduce a novel KF-based prediction module called the Ego-motion Aware Target Prediction (EMAP) module by focusing on the integration of camera motion and depth information with object motion models. Our proposed method decouples the impact of camera rotational and translational velocity from the object trajectories by reformulating the Kalman Filter. This reformulation enables us to reject the disturbances caused by camera motion and maximizes the reliability of the object motion model. We integrate our module with four state-of-the-art base MOT algorithms, namely OC-SORT, Deep OC-SORT, ByteTrack, and BoT-SORT. In particular, our evaluation on the KITTI MOT dataset demonstrates that EMAP remarkably drops the number of identity switches (IDSW) of OC-SORT and Deep OC-SORT by 73% and 21%, respectively. At the same time, it elevates other performance metrics such as HOTA by more than 5%. Our source code is available at https://github.com/noyzzz/EMAP.
Paper Structure (17 sections, 12 equations, 4 figures, 4 tables, 1 algorithm)

This paper contains 17 sections, 12 equations, 4 figures, 4 tables, 1 algorithm.

Figures (4)

  • Figure 1: In this scenario, the object detection is missed, leading to failure in predicting the location of the object during lane-changing for Vanilla ByteTrack (a). ByteTrack + EMAP (b) significantly improves the prediction, successfully tracking the object in the next two frames.
  • Figure 2: Diagram illustrating the three phases of a detection-based multi-object tracking algorithm. The figure highlights the integration of the EMAP module within the system.
  • Figure 3: Visualization of town #10 in CARLA simulator, featuring four distinct simulation scenarios superimposed on the map. The paths illustrate the diverse trajectories taken in our dataset, capturing a range of scenarios for comprehensive analysis.
  • Figure 4: HOTA vs IDSW comparisons of OC-SORT, Deep OC-SORT, ByteTrack, and BoT-SORT on 21 KITTI train sequences with or without EMAP module. The height and width of the ellipses are the standard deviation of the distribution.