Deep Learning-Based Robust Multi-Object Tracking via Fusion of mmWave Radar and Camera Sensors
Lei Cheng, Arindam Sengupta, Siyang Cao
TL;DR
This work tackles robust multi-object tracking for autonomous driving by fusing radar and camera data with a deep learning-based MOT framework. It combines a FaceNet-inspired appearance embedding with a Bi-LSTM motion model and a tri-output fusion that yields per-sensor and fused trackers, improving robustness to sensor failures and low-visibility conditions. Key contributions include a discriminative 128-d appearance embedding trained with triplet loss (pre-trained on CASIA-WebFace and fine-tuned on a custom dataset with a threshold of 1.09 achieving 98.65% accuracy), a Bi-LSTM-based motion predictor, and a dual-cue association via Hungarian matching that efficiently fuses appearance and position cues. Evaluations on controlled parking-lot data and the NuScenes dataset, using metrics such as $MOTA$, $MOTP$, $AMOTA$, and $AMOTP$, show improved MOTP and competitive MOTA, with real-time performance (~14 FPS), demonstrating practical applicability and resilience in adverse sensing conditions.
Abstract
Autonomous driving holds great promise in addressing traffic safety concerns by leveraging artificial intelligence and sensor technology. Multi-Object Tracking plays a critical role in ensuring safer and more efficient navigation through complex traffic scenarios. This paper presents a novel deep learning-based method that integrates radar and camera data to enhance the accuracy and robustness of Multi-Object Tracking in autonomous driving systems. The proposed method leverages a Bi-directional Long Short-Term Memory network to incorporate long-term temporal information and improve motion prediction. An appearance feature model inspired by FaceNet is used to establish associations between objects across different frames, ensuring consistent tracking. A tri-output mechanism is employed, consisting of individual outputs for radar and camera sensors and a fusion output, to provide robustness against sensor failures and produce accurate tracking results. Through extensive evaluations of real-world datasets, our approach demonstrates remarkable improvements in tracking accuracy, ensuring reliable performance even in low-visibility scenarios.
