Table of Contents
Fetching ...

Boosting Online 3D Multi-Object Tracking through Camera-Radar Cross Check

Sheng-Yao Kuan, Jen-Hao Cheng, Hsiang-Wei Huang, Wenhao Chai, Cheng-Yen Yang, Hugo Latapie, Gaowen Liu, Bing-Fei Wu, Jenq-Neng Hwang

TL;DR

CRAFTBooster tackles the challenge of surpassing single-modality detectors for 3D MOT by introducing an online, cross-modality fusion framework that runs tracking-enhanced fusion between camera and radar. It decomposes the system into three modules—Inner-modality Matching, Cross-modality Check, and Multi-modality Fusion—exploiting perspective-view camera detections and BEV radar detections to recover missed tracklets and fuse observations. Empirically, it yields about 5-6% IDF1 gains on K-Radar and 1-2% on CRUW3D, demonstrating robustness across diverse weather conditions and compatibility with existing online trackers. The work highlights the practical potential of dedicated tracking-stage fusion to advance reliable 3D MOT in autonomous driving.

Abstract

In the domain of autonomous driving, the integration of multi-modal perception techniques based on data from diverse sensors has demonstrated substantial progress. Effectively surpassing the capabilities of state-of-the-art single-modality detectors through sensor fusion remains an active challenge. This work leverages the respective advantages of cameras in perspective view and radars in Bird's Eye View (BEV) to greatly enhance overall detection and tracking performance. Our approach, Camera-Radar Associated Fusion Tracking Booster (CRAFTBooster), represents a pioneering effort to enhance radar-camera fusion in the tracking stage, contributing to improved 3D MOT accuracy. The superior experimental results on the K-Radaar dataset, which exhibit 5-6% on IDF1 tracking performance gain, validate the potential of effective sensor fusion in advancing autonomous driving.

Boosting Online 3D Multi-Object Tracking through Camera-Radar Cross Check

TL;DR

CRAFTBooster tackles the challenge of surpassing single-modality detectors for 3D MOT by introducing an online, cross-modality fusion framework that runs tracking-enhanced fusion between camera and radar. It decomposes the system into three modules—Inner-modality Matching, Cross-modality Check, and Multi-modality Fusion—exploiting perspective-view camera detections and BEV radar detections to recover missed tracklets and fuse observations. Empirically, it yields about 5-6% IDF1 gains on K-Radar and 1-2% on CRUW3D, demonstrating robustness across diverse weather conditions and compatibility with existing online trackers. The work highlights the practical potential of dedicated tracking-stage fusion to advance reliable 3D MOT in autonomous driving.

Abstract

In the domain of autonomous driving, the integration of multi-modal perception techniques based on data from diverse sensors has demonstrated substantial progress. Effectively surpassing the capabilities of state-of-the-art single-modality detectors through sensor fusion remains an active challenge. This work leverages the respective advantages of cameras in perspective view and radars in Bird's Eye View (BEV) to greatly enhance overall detection and tracking performance. Our approach, Camera-Radar Associated Fusion Tracking Booster (CRAFTBooster), represents a pioneering effort to enhance radar-camera fusion in the tracking stage, contributing to improved 3D MOT accuracy. The superior experimental results on the K-Radaar dataset, which exhibit 5-6% on IDF1 tracking performance gain, validate the potential of effective sensor fusion in advancing autonomous driving.
Paper Structure (20 sections, 4 equations, 9 figures, 5 tables)

This paper contains 20 sections, 4 equations, 9 figures, 5 tables.

Figures (9)

  • Figure 1: The image-based detections frequently confront challenge related to occlusion as shown in (a), whereas the radar-based detections encounter complications due to weak reflectance as shown in (c). The tracking result shows in (b) is from our method, CRAFTBooster, which can address these issue by fusion in tracking stage. Note that the background in (b) is LiDAR point cloud and the one in (c) is radar point cloud generated by the Constant False Alarm Rate (CFAR) process and both of these data are only for demostration purpose.
  • Figure 2: CRAFTBooster is a comprehensive multi-modality fusion framework designed for the 3D MOT task based on camera and radar, specifically utilizing detection results from 3D object detections in two modalities as its input. The architecture is comprised of three main components: Inner-modality Matching Module, Cross-modality Check Module and Multi-modality Fusion Module. Tr. denotes tracklet and Det. denotes detections. Modality 1$^{st}$ and 2$^{nd}$ can alternate between camera and radar.
  • Figure 3: Cross-modality Check on Unmatched Tracklets. The missing data in unmatched tracklets would be recovered by the paired and active tracklet information.
  • Figure 4: Cross-modality Check on Unmatched Detections in Perspective View. The unmatched detections from radar would be projected to perspective view to be checked with camera detections.
  • Figure 5: Cross-modality Check on Unmatched Detections in BEV. The unmatched detections from camera would be projected to BEV to be checked with radar detections.
  • ...and 4 more figures