Table of Contents
Fetching ...

MCTrack: A Unified 3D Multi-Object Tracking Framework for Autonomous Driving

Xiyang Wang, Shouzheng Qi, Jieyou Zhao, Hangning Zhou, Siyu Zhang, Guoan Wang, Kai Tu, Songlin Guo, Jianbo Zhao, Jian Li, Mu Yang

TL;DR

MCTrack tackles the lack of generalizability in 3D MOT by introducing a unified TBM framework that operates on a standardized BaseVersion format across KITTI, nuScenes, and Waymo. The core innovations are a decoupled Kalman-filter design for position, size, and heading, and Ro_GDIoU-based two-stage matching that combines BEV and RV perspectives to robustly associate trajectories. The paper also proposes motion-centric evaluation metrics (e.g., VAE, VNE, VDE) to quantify downstream-relevant motion outputs like velocity and acceleration. Empirically, MCTrack achieves SOTA performance on multiple datasets and demonstrates that Ro_GDIoU and secondary RV matching improve robustness, while BaseVersion reduces cross-dataset preprocessing burdens for researchers and practitioners.

Abstract

This paper introduces MCTrack, a new 3D multi-object tracking method that achieves state-of-the-art (SOTA) performance across KITTI, nuScenes, and Waymo datasets. Addressing the gap in existing tracking paradigms, which often perform well on specific datasets but lack generalizability, MCTrack offers a unified solution. Additionally, we have standardized the format of perceptual results across various datasets, termed BaseVersion, facilitating researchers in the field of multi-object tracking (MOT) to concentrate on the core algorithmic development without the undue burden of data preprocessing. Finally, recognizing the limitations of current evaluation metrics, we propose a novel set that assesses motion information output, such as velocity and acceleration, crucial for downstream tasks. The source codes of the proposed method are available at this link: https://github.com/megvii-research/MCTrack}{https://github.com/megvii-research/MCTrack

MCTrack: A Unified 3D Multi-Object Tracking Framework for Autonomous Driving

TL;DR

MCTrack tackles the lack of generalizability in 3D MOT by introducing a unified TBM framework that operates on a standardized BaseVersion format across KITTI, nuScenes, and Waymo. The core innovations are a decoupled Kalman-filter design for position, size, and heading, and Ro_GDIoU-based two-stage matching that combines BEV and RV perspectives to robustly associate trajectories. The paper also proposes motion-centric evaluation metrics (e.g., VAE, VNE, VDE) to quantify downstream-relevant motion outputs like velocity and acceleration. Empirically, MCTrack achieves SOTA performance on multiple datasets and demonstrates that Ro_GDIoU and secondary RV matching improve robustness, while BaseVersion reduces cross-dataset preprocessing burdens for researchers and practitioners.

Abstract

This paper introduces MCTrack, a new 3D multi-object tracking method that achieves state-of-the-art (SOTA) performance across KITTI, nuScenes, and Waymo datasets. Addressing the gap in existing tracking paradigms, which often perform well on specific datasets but lack generalizability, MCTrack offers a unified solution. Additionally, we have standardized the format of perceptual results across various datasets, termed BaseVersion, facilitating researchers in the field of multi-object tracking (MOT) to concentrate on the core algorithmic development without the undue burden of data preprocessing. Finally, recognizing the limitations of current evaluation metrics, we propose a novel set that assesses motion information output, such as velocity and acceleration, crucial for downstream tasks. The source codes of the proposed method are available at this link: https://github.com/megvii-research/MCTrack}{https://github.com/megvii-research/MCTrack
Paper Structure (27 sections, 25 equations, 7 figures, 9 tables, 2 algorithms)

This paper contains 27 sections, 25 equations, 7 figures, 9 tables, 2 algorithms.

Figures (7)

  • Figure 1: The comparison of the proposed method with SOTA methods across different datasets. For the first time, we have achieved SOTA performance on all three datasets.
  • Figure 2: Overview of our unified 3D MOT framework MCTrack. Our input involves converting datasets such as KITTI, nuScenes, and Waymo into a unified format known as BaseVersion. The entire pipeline operates within the world coordinate system. Initially, we project 3D point coordinates from the world coordinate system onto the BEV plane for the primary matching phase. Subsequently, unmatched trajectory boxes and detection boxes are projected onto the image plane for secondary matching. Finally, the state of the trajectories is updated, along with the Kalman filter. Our output includes motion information such as position, velocity, and acceleration, which are essential for downstream tasks like prediction and planning.
  • Figure 3: BaseVersion data format overview.
  • Figure 4: The problem existing in the tracking field with ${DIoU}$.
  • Figure 5: Schematic of ${Ro\_GDIoU}$ calculation.
  • ...and 2 more figures