Data-Driven Object Tracking: Integrating Modular Neural Networks into a Kalman Framework
Christian Alexander Holz, Christian Bader, Markus Enzweiler, Matthias Drüppel
TL;DR
This work tackles accurate MOT for ADAS by embedding three lightweight neural networks—SPENT for single-object state prediction, SANT for single-object-to-track association, and MANTa for multi-object-to-track association—into a Kalman Filter-based Tracking-by-Detection framework. All networks are designed for real-time embedded inference and maintain the modular, interpretable KF structure while replacing traditional prediction and assignment components with data-driven equivalents. Evaluations on the KITTI dataset show SPENT reduces RMSE by about 50% versus a standard KF (to ~0.029), and SANT/MANTa achieve up to 95% association accuracy (with MANTa performing best on 1–6 track scenarios and an average of ~80% across the full dataset). The results demonstrate that task-specific neural modules can boost tracking performance and robustness without sacrificing modularity or maintainability, enabling adaptable ADAS solutions.
Abstract
This paper presents novel Machine Learning (ML) methodologies for Multi-Object Tracking (MOT), specifically designed to meet the increasing complexity and precision demands of Advanced Driver Assistance Systems (ADAS). We introduce three Neural Network (NN) models that address key challenges in MOT: (i) the Single-Prediction Network (SPENT) for trajectory prediction, (ii) the Single-Association Network (SANT) for mapping individual Sensor Object (SO) to existing tracks, and (iii) the Multi-Association Network (MANTa) for associating multiple SOs to multiple tracks. These models are seamlessly integrated into a traditional Kalman Filter (KF) framework, maintaining the system's modularity by replacing relevant components without disrupting the overall architecture. Importantly, all three networks are designed to be run in a realtime, embedded environment. Each network contains less than 50k trainable parameters. Our evaluation, conducted on the public KITTI tracking dataset, demonstrates significant improvements in tracking performance. SPENT reduces the Root Mean Square Error (RMSE) by 50% compared to a standard KF, while SANT and MANTa achieve up to 95% accuracy in sensor object-to-track assignments. These results underscore the effectiveness of incorporating task-specific NNs into traditional tracking systems, boosting performance and robustness while preserving modularity, maintainability, and interpretability.
