Table of Contents
Fetching ...

BEVTrack: A Simple and Strong Baseline for 3D Single Object Tracking in Bird's-Eye View

Yuxiang Yang, Yingqi Deng, Mian Pan, Zheng-Jun Zha, Jing Zhang

TL;DR

BEVTrack directly estimates object motion in Bird's-Eye View (BEV) using a single regression loss to enhance accuracy for targets with diverse attributes, and learns adaptive likelihood functions tailored to individual targets, avoiding the limitations of fixed distribution assumptions in previous methods.

Abstract

3D Single Object Tracking (SOT) is a fundamental task in computer vision and plays a critical role in applications like autonomous driving. However, existing algorithms often involve complex designs and multiple loss functions, making model training and deployment challenging. Furthermore, their reliance on fixed probability distribution assumptions (e.g., Laplacian or Gaussian) hinders their ability to adapt to diverse target characteristics such as varying sizes and motion patterns, ultimately affecting tracking precision and robustness. To address these issues, we propose BEVTrack, a simple yet effective motion-based tracking method. BEVTrack directly estimates object motion in Bird's-Eye View (BEV) using a single regression loss. To enhance accuracy for targets with diverse attributes, it learns adaptive likelihood functions tailored to individual targets, avoiding the limitations of fixed distribution assumptions in previous methods. This approach provides valuable priors for tracking and significantly boosts performance. Comprehensive experiments on KITTI, NuScenes, and Waymo Open Dataset demonstrate that BEVTrack achieves state-of-the-art results while operating at 200 FPS, enabling real-time applicability. The code will be released at https://github.com/xmm-prio/BEVTrack.

BEVTrack: A Simple and Strong Baseline for 3D Single Object Tracking in Bird's-Eye View

TL;DR

BEVTrack directly estimates object motion in Bird's-Eye View (BEV) using a single regression loss to enhance accuracy for targets with diverse attributes, and learns adaptive likelihood functions tailored to individual targets, avoiding the limitations of fixed distribution assumptions in previous methods.

Abstract

3D Single Object Tracking (SOT) is a fundamental task in computer vision and plays a critical role in applications like autonomous driving. However, existing algorithms often involve complex designs and multiple loss functions, making model training and deployment challenging. Furthermore, their reliance on fixed probability distribution assumptions (e.g., Laplacian or Gaussian) hinders their ability to adapt to diverse target characteristics such as varying sizes and motion patterns, ultimately affecting tracking precision and robustness. To address these issues, we propose BEVTrack, a simple yet effective motion-based tracking method. BEVTrack directly estimates object motion in Bird's-Eye View (BEV) using a single regression loss. To enhance accuracy for targets with diverse attributes, it learns adaptive likelihood functions tailored to individual targets, avoiding the limitations of fixed distribution assumptions in previous methods. This approach provides valuable priors for tracking and significantly boosts performance. Comprehensive experiments on KITTI, NuScenes, and Waymo Open Dataset demonstrate that BEVTrack achieves state-of-the-art results while operating at 200 FPS, enabling real-time applicability. The code will be released at https://github.com/xmm-prio/BEVTrack.
Paper Structure (23 sections, 5 equations, 6 figures, 9 tables)

This paper contains 23 sections, 5 equations, 6 figures, 9 tables.

Figures (6)

  • Figure 1: Comparison with typical 3D SOT paradigms. Previous methods mainly rely on point-based representations, and decompose the tracking problem into multiple subtasks, leading to a complicated tracking framework. On the contrary, our proposed BEVTrack simplifies the tracking pipeline with a single regression loss.
  • Figure 2: Architecture of BEVTrack. The proposed framework contains three parts including voxel-based feature extraction, BEV-based motion modeling, and distribution-aware regression. Our BEVTrack is a simple tracking baseline framework with a plain convolutional architecture and a single regression loss, yet demonstrating state-of-the-art performance.
  • Figure 3: The motion pattern of objects is variable in different scenes. The top rows show point cloud scenes across two consecutive frames, where the red points indicate the target. The bottom rows visualize the heatmaps of the BEV response map with ground-truth bounding box (in red rectangles).
  • Figure 4: Visualization results on different KITTI categories: (a) Car; (b) Pedestrian; (c) Van; (d) Cyclist.
  • Figure 5: Robustness under different challenging scenes on KITTI Pedestrian category.
  • ...and 1 more figures