Table of Contents
Fetching ...

Multi-Object Tracking with Camera-LiDAR Fusion for Autonomous Driving

Riccardo Pieroni, Simone Specchia, Matteo Corno, Sergio Matteo Savaresi

TL;DR

The paper tackles robust multi-object tracking for autonomous driving by fusing camera and LiDAR observations in a map-free setting. It proposes a four-block MOT pipeline where a camera-based 3D detector and LiDAR clustering feed a three-step data association (LiDAR-track, Camera-track, Camera-LiDAR-track) to initialize and update EKF-based tracks under a CTRV motion model in the ego frame, with flexible measurement functions. Key contributions include a CTRV EKF that estimates absolute longitudinal velocity and yaw rate without external position references, and a three-way data association scheme that adapts measurements per track for robust tracking. Validation on KITTI MOT and real-world tests shows that the multi-modal approach outperforms single-modality baselines and is robust across different LiDAR configurations, enabling accurate, map-free MOT for autonomous driving scenarios.

Abstract

This paper presents a novel multi-modal Multi-Object Tracking (MOT) algorithm for self-driving cars that combines camera and LiDAR data. Camera frames are processed with a state-of-the-art 3D object detector, whereas classical clustering techniques are used to process LiDAR observations. The proposed MOT algorithm comprises a three-step association process, an Extended Kalman filter for estimating the motion of each detected dynamic obstacle, and a track management phase. The EKF motion model requires the current measured relative position and orientation of the observed object and the longitudinal and angular velocities of the ego vehicle as inputs. Unlike most state-of-the-art multi-modal MOT approaches, the proposed algorithm does not rely on maps or knowledge of the ego global pose. Moreover, it uses a 3D detector exclusively for cameras and is agnostic to the type of LiDAR sensor used. The algorithm is validated both in simulation and with real-world data, with satisfactory results.

Multi-Object Tracking with Camera-LiDAR Fusion for Autonomous Driving

TL;DR

The paper tackles robust multi-object tracking for autonomous driving by fusing camera and LiDAR observations in a map-free setting. It proposes a four-block MOT pipeline where a camera-based 3D detector and LiDAR clustering feed a three-step data association (LiDAR-track, Camera-track, Camera-LiDAR-track) to initialize and update EKF-based tracks under a CTRV motion model in the ego frame, with flexible measurement functions. Key contributions include a CTRV EKF that estimates absolute longitudinal velocity and yaw rate without external position references, and a three-way data association scheme that adapts measurements per track for robust tracking. Validation on KITTI MOT and real-world tests shows that the multi-modal approach outperforms single-modality baselines and is robust across different LiDAR configurations, enabling accurate, map-free MOT for autonomous driving scenarios.

Abstract

This paper presents a novel multi-modal Multi-Object Tracking (MOT) algorithm for self-driving cars that combines camera and LiDAR data. Camera frames are processed with a state-of-the-art 3D object detector, whereas classical clustering techniques are used to process LiDAR observations. The proposed MOT algorithm comprises a three-step association process, an Extended Kalman filter for estimating the motion of each detected dynamic obstacle, and a track management phase. The EKF motion model requires the current measured relative position and orientation of the observed object and the longitudinal and angular velocities of the ego vehicle as inputs. Unlike most state-of-the-art multi-modal MOT approaches, the proposed algorithm does not rely on maps or knowledge of the ego global pose. Moreover, it uses a 3D detector exclusively for cameras and is agnostic to the type of LiDAR sensor used. The algorithm is validated both in simulation and with real-world data, with satisfactory results.
Paper Structure (13 sections, 6 equations, 4 figures, 3 tables)

This paper contains 13 sections, 6 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: Schematic representation of the proposed MOT algorithm.
  • Figure 2: Output of the camera processing module.
  • Figure 3: Output example of the LiDAR processing module.
  • Figure 4: Track state estimation error. The four plots show respectively the position, orientation, speed and yaw rate estimate errors.