Table of Contents
Fetching ...

ByteTrackV2: 2D and 3D Multi-Object Tracking by Associating Every Detection Box

Yifu Zhang, Xinggang Wang, Xiaoqing Ye, Wei Zhang, Jincheng Lu, Xiao Tan, Errui Ding, Peize Sun, Jingdong Wang

TL;DR

ByteTrackV2 delivers a simple, unified motion-driven framework for 2D and 3D multi-object tracking by exploiting low-score detection boxes through a hierarchical data association. It introduces a complementary 3D motion prediction that blends detected velocity with Kalman-filter predictions to handle abrupt motions and occlusions, while remaining detector-agnostic. The approach achieves state-of-the-art results on nuScenes (camera and LiDAR) and strong performance across MOT17, MOT20, HiEve, and BDD100K, all with a nonparametric design that readily integrates with diverse detectors. The work demonstrates robust cross-modality performance and practical applicability in real-world tracking tasks. Overall, ByteTrackV2 provides a scalable, high-performance MOT solution with broad applicability in autonomous driving and related domains.

Abstract

Multi-object tracking (MOT) aims at estimating bounding boxes and identities of objects across video frames. Detection boxes serve as the basis of both 2D and 3D MOT. The inevitable changing of detection scores leads to object missing after tracking. We propose a hierarchical data association strategy to mine the true objects in low-score detection boxes, which alleviates the problems of object missing and fragmented trajectories. The simple and generic data association strategy shows effectiveness under both 2D and 3D settings. In 3D scenarios, it is much easier for the tracker to predict object velocities in the world coordinate. We propose a complementary motion prediction strategy that incorporates the detected velocities with a Kalman filter to address the problem of abrupt motion and short-term disappearing. ByteTrackV2 leads the nuScenes 3D MOT leaderboard in both camera (56.4% AMOTA) and LiDAR (70.1% AMOTA) modalities. Furthermore, it is nonparametric and can be integrated with various detectors, making it appealing in real applications. The source code is released at https://github.com/ifzhang/ByteTrack-V2.

ByteTrackV2: 2D and 3D Multi-Object Tracking by Associating Every Detection Box

TL;DR

ByteTrackV2 delivers a simple, unified motion-driven framework for 2D and 3D multi-object tracking by exploiting low-score detection boxes through a hierarchical data association. It introduces a complementary 3D motion prediction that blends detected velocity with Kalman-filter predictions to handle abrupt motions and occlusions, while remaining detector-agnostic. The approach achieves state-of-the-art results on nuScenes (camera and LiDAR) and strong performance across MOT17, MOT20, HiEve, and BDD100K, all with a nonparametric design that readily integrates with diverse detectors. The work demonstrates robust cross-modality performance and practical applicability in real-world tracking tasks. Overall, ByteTrackV2 provides a scalable, high-performance MOT solution with broad applicability in autonomous driving and related domains.

Abstract

Multi-object tracking (MOT) aims at estimating bounding boxes and identities of objects across video frames. Detection boxes serve as the basis of both 2D and 3D MOT. The inevitable changing of detection scores leads to object missing after tracking. We propose a hierarchical data association strategy to mine the true objects in low-score detection boxes, which alleviates the problems of object missing and fragmented trajectories. The simple and generic data association strategy shows effectiveness under both 2D and 3D settings. In 3D scenarios, it is much easier for the tracker to predict object velocities in the world coordinate. We propose a complementary motion prediction strategy that incorporates the detected velocities with a Kalman filter to address the problem of abrupt motion and short-term disappearing. ByteTrackV2 leads the nuScenes 3D MOT leaderboard in both camera (56.4% AMOTA) and LiDAR (70.1% AMOTA) modalities. Furthermore, it is nonparametric and can be integrated with various detectors, making it appealing in real applications. The source code is released at https://github.com/ifzhang/ByteTrack-V2.
Paper Structure (19 sections, 3 equations, 8 figures, 10 tables)

This paper contains 19 sections, 3 equations, 8 figures, 10 tables.

Figures (8)

  • Figure 1: Illustration of 2D multi-object tracking and 3D multi-object tracking. The first row shows the visualization of 2D MOT, which is performed on the image plane. The second row and the third row show the visualization of 3D MOT from the multi-view images and the Bird's Eye View (BEV) of the LiDAR point clouds, respectively. The same colors represent the same object identities.
  • Figure 2: Examples of our method that associates every detection box. (a) shows all the detection boxes with their scores. (b) shows the tracklets obtained by previous methods that associate detection boxes whose scores are higher than a threshold, i.e. 0.5. The same box color represents the same identity. (c) shows the tracklets obtained by our method. The dashed boxes represent the predicted box of the previous tracklets using Kalman filter. The two low-score detection boxes are correctly matched to the previous tracklets based on the large IoU. The number colored in yellow denotes the score of the box.
  • Figure 3: Comparison of the performances of BYTE and SORT under different detection score thresholds. The results are from the validation set of MOT17.
  • Figure 4: Comparison of the number of TPs and FPs in all low score detection boxes and the low score tracked boxes obtained by BYTE. The results are from the validation set of MOT17.
  • Figure 5: Visualization results of ByteTrack under the 2D MOT setting. We select 6 sequences from the validation set of MOT17 and show the effectiveness of ByteTrack to handle difficult cases such as occlusion and motion blur. The yellow triangle represents the high score box and the red triangle represents the low score box. The same box color represents the same identity.
  • ...and 3 more figures