Table of Contents
Fetching ...

3D Multi-Object Tracking: A Baseline and New Evaluation Metrics

Xinshuo Weng, Jianren Wang, David Held, Kris Kitani

TL;DR

The paper tackles real-time 3D multi-object tracking by proposing a simple baseline built from a 3D Kalman filter and Hungarian data association, operating on 3D detections without training. It extends the state to 3D space, including position, size, velocity, and heading, and adds a Birth and Death Memory to handle object appearance and disappearance. To enable fair, multi-threshold evaluation of 3D MOT systems, it introduces a 3D MOT evaluation tool and three integral metrics: AMOTA, AMOTP, and sAMOTA (with sAMOTA ensuring a 0–100% bound), addressing confidence-score use and threshold sensitivity. Empirical results on KITTI and nuScenes show state-of-the-art 3D MOT performance and the fastest runtime (207.4 FPS on KITTI), with significant insights from ablations on detector quality, motion modeling, and the orientation-correction strategy, highlighting the practical impact of a lightweight yet effective baseline. The work provides a standardized baseline and evaluation framework to accelerate fair comparisons and real-world deployments of 3D MOT systems.

Abstract

3D multi-object tracking (MOT) is an essential component for many applications such as autonomous driving and assistive robotics. Recent work on 3D MOT focuses on developing accurate systems giving less attention to practical considerations such as computational cost and system complexity. In contrast, this work proposes a simple real-time 3D MOT system. Our system first obtains 3D detections from a LiDAR point cloud. Then, a straightforward combination of a 3D Kalman filter and the Hungarian algorithm is used for state estimation and data association. Additionally, 3D MOT datasets such as KITTI evaluate MOT methods in the 2D space and standardized 3D MOT evaluation tools are missing for a fair comparison of 3D MOT methods. Therefore, we propose a new 3D MOT evaluation tool along with three new metrics to comprehensively evaluate 3D MOT methods. We show that, although our system employs a combination of classical MOT modules, we achieve state-of-the-art 3D MOT performance on two 3D MOT benchmarks (KITTI and nuScenes). Surprisingly, although our system does not use any 2D data as inputs, we achieve competitive performance on the KITTI 2D MOT leaderboard. Our proposed system runs at a rate of $207.4$ FPS on the KITTI dataset, achieving the fastest speed among all modern MOT systems. To encourage standardized 3D MOT evaluation, our system and evaluation code are made publicly available at https://github.com/xinshuoweng/AB3DMOT.

3D Multi-Object Tracking: A Baseline and New Evaluation Metrics

TL;DR

The paper tackles real-time 3D multi-object tracking by proposing a simple baseline built from a 3D Kalman filter and Hungarian data association, operating on 3D detections without training. It extends the state to 3D space, including position, size, velocity, and heading, and adds a Birth and Death Memory to handle object appearance and disappearance. To enable fair, multi-threshold evaluation of 3D MOT systems, it introduces a 3D MOT evaluation tool and three integral metrics: AMOTA, AMOTP, and sAMOTA (with sAMOTA ensuring a 0–100% bound), addressing confidence-score use and threshold sensitivity. Empirical results on KITTI and nuScenes show state-of-the-art 3D MOT performance and the fastest runtime (207.4 FPS on KITTI), with significant insights from ablations on detector quality, motion modeling, and the orientation-correction strategy, highlighting the practical impact of a lightweight yet effective baseline. The work provides a standardized baseline and evaluation framework to accelerate fair comparisons and real-world deployments of 3D MOT systems.

Abstract

3D multi-object tracking (MOT) is an essential component for many applications such as autonomous driving and assistive robotics. Recent work on 3D MOT focuses on developing accurate systems giving less attention to practical considerations such as computational cost and system complexity. In contrast, this work proposes a simple real-time 3D MOT system. Our system first obtains 3D detections from a LiDAR point cloud. Then, a straightforward combination of a 3D Kalman filter and the Hungarian algorithm is used for state estimation and data association. Additionally, 3D MOT datasets such as KITTI evaluate MOT methods in the 2D space and standardized 3D MOT evaluation tools are missing for a fair comparison of 3D MOT methods. Therefore, we propose a new 3D MOT evaluation tool along with three new metrics to comprehensively evaluate 3D MOT methods. We show that, although our system employs a combination of classical MOT modules, we achieve state-of-the-art 3D MOT performance on two 3D MOT benchmarks (KITTI and nuScenes). Surprisingly, although our system does not use any 2D data as inputs, we achieve competitive performance on the KITTI 2D MOT leaderboard. Our proposed system runs at a rate of FPS on the KITTI dataset, achieving the fastest speed among all modern MOT systems. To encourage standardized 3D MOT evaluation, our system and evaluation code are made publicly available at https://github.com/xinshuoweng/AB3DMOT.

Paper Structure

This paper contains 18 sections, 7 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: MOTA of modern 2D and 3D MOT systems on the KITTI 2D MOT leaderboard. The higher and more right is better. Our 3D MOT system achieves competitive MOTA in 2D MOT evaluation while being the fastest.
  • Figure 2: Proposed System Pipeline: (A) a 3D detection module obtains 3D detections $D_t$ from the LiDAR point cloud; (B) a 3D Kalman filter predicts the state of trajectories $T_{t-1}$ to the current frame $t$ as $T_\text{est}$ during the state prediction step; (C) the detections $D_t$ and predicted trajectories $T_\text{est}$ are associated using the Hungarian algorithm; (D) the state of each matched trajectory in $T_\text{match}$ is updated by the 3D Kalman filter based on the corresponding matched detection in $D_\text{match}$ to obtain the final trajectories $T_t$; (E) a birth and death memory takes the unmatched detections $D_\text{unmatch}$ and unmatched trajectories $T_\text{unmatch}$ as inputs and creates new trajectories $T_\text{new}$ and deletes disappeared trajectories $T_\text{lost}$ from the associated trajectories.
  • Figure 3: (a)(b)(c) The effect of confidence threshold on the CLEAR metrics: MOTA, FN and FP. We evaluate our 3D MOT system on the KITTI dataset using the proposed 3D MOT evaluation tool. We show that, to achieve the highest MOTA, a proper confidence threshold needs to be selected, otherwise the performance of MOTA will be decreased significantly due to a large number of false positives or false negatives. (d) Effect of scale adjustment in MOTA: the proposed scaled accuracy sMOTA has an upper bounding of $100\%$ at any recall value.
  • Figure 4: Qualitative comparison between FANTrack Baser2019 (left) and our system (right) on the sequence 3 of the KITTI test set.