3D Multi-Object Tracking: A Baseline and New Evaluation Metrics
Xinshuo Weng, Jianren Wang, David Held, Kris Kitani
TL;DR
The paper tackles real-time 3D multi-object tracking by proposing a simple baseline built from a 3D Kalman filter and Hungarian data association, operating on 3D detections without training. It extends the state to 3D space, including position, size, velocity, and heading, and adds a Birth and Death Memory to handle object appearance and disappearance. To enable fair, multi-threshold evaluation of 3D MOT systems, it introduces a 3D MOT evaluation tool and three integral metrics: AMOTA, AMOTP, and sAMOTA (with sAMOTA ensuring a 0–100% bound), addressing confidence-score use and threshold sensitivity. Empirical results on KITTI and nuScenes show state-of-the-art 3D MOT performance and the fastest runtime (207.4 FPS on KITTI), with significant insights from ablations on detector quality, motion modeling, and the orientation-correction strategy, highlighting the practical impact of a lightweight yet effective baseline. The work provides a standardized baseline and evaluation framework to accelerate fair comparisons and real-world deployments of 3D MOT systems.
Abstract
3D multi-object tracking (MOT) is an essential component for many applications such as autonomous driving and assistive robotics. Recent work on 3D MOT focuses on developing accurate systems giving less attention to practical considerations such as computational cost and system complexity. In contrast, this work proposes a simple real-time 3D MOT system. Our system first obtains 3D detections from a LiDAR point cloud. Then, a straightforward combination of a 3D Kalman filter and the Hungarian algorithm is used for state estimation and data association. Additionally, 3D MOT datasets such as KITTI evaluate MOT methods in the 2D space and standardized 3D MOT evaluation tools are missing for a fair comparison of 3D MOT methods. Therefore, we propose a new 3D MOT evaluation tool along with three new metrics to comprehensively evaluate 3D MOT methods. We show that, although our system employs a combination of classical MOT modules, we achieve state-of-the-art 3D MOT performance on two 3D MOT benchmarks (KITTI and nuScenes). Surprisingly, although our system does not use any 2D data as inputs, we achieve competitive performance on the KITTI 2D MOT leaderboard. Our proposed system runs at a rate of $207.4$ FPS on the KITTI dataset, achieving the fastest speed among all modern MOT systems. To encourage standardized 3D MOT evaluation, our system and evaluation code are made publicly available at https://github.com/xinshuoweng/AB3DMOT.
