Table of Contents
Fetching ...

Data-Driven Object Tracking: Integrating Modular Neural Networks into a Kalman Framework

Christian Alexander Holz, Christian Bader, Markus Enzweiler, Matthias Drüppel

TL;DR

This work tackles accurate MOT for ADAS by embedding three lightweight neural networks—SPENT for single-object state prediction, SANT for single-object-to-track association, and MANTa for multi-object-to-track association—into a Kalman Filter-based Tracking-by-Detection framework. All networks are designed for real-time embedded inference and maintain the modular, interpretable KF structure while replacing traditional prediction and assignment components with data-driven equivalents. Evaluations on the KITTI dataset show SPENT reduces RMSE by about 50% versus a standard KF (to ~0.029), and SANT/MANTa achieve up to 95% association accuracy (with MANTa performing best on 1–6 track scenarios and an average of ~80% across the full dataset). The results demonstrate that task-specific neural modules can boost tracking performance and robustness without sacrificing modularity or maintainability, enabling adaptable ADAS solutions.

Abstract

This paper presents novel Machine Learning (ML) methodologies for Multi-Object Tracking (MOT), specifically designed to meet the increasing complexity and precision demands of Advanced Driver Assistance Systems (ADAS). We introduce three Neural Network (NN) models that address key challenges in MOT: (i) the Single-Prediction Network (SPENT) for trajectory prediction, (ii) the Single-Association Network (SANT) for mapping individual Sensor Object (SO) to existing tracks, and (iii) the Multi-Association Network (MANTa) for associating multiple SOs to multiple tracks. These models are seamlessly integrated into a traditional Kalman Filter (KF) framework, maintaining the system's modularity by replacing relevant components without disrupting the overall architecture. Importantly, all three networks are designed to be run in a realtime, embedded environment. Each network contains less than 50k trainable parameters. Our evaluation, conducted on the public KITTI tracking dataset, demonstrates significant improvements in tracking performance. SPENT reduces the Root Mean Square Error (RMSE) by 50% compared to a standard KF, while SANT and MANTa achieve up to 95% accuracy in sensor object-to-track assignments. These results underscore the effectiveness of incorporating task-specific NNs into traditional tracking systems, boosting performance and robustness while preserving modularity, maintainability, and interpretability.

Data-Driven Object Tracking: Integrating Modular Neural Networks into a Kalman Framework

TL;DR

This work tackles accurate MOT for ADAS by embedding three lightweight neural networks—SPENT for single-object state prediction, SANT for single-object-to-track association, and MANTa for multi-object-to-track association—into a Kalman Filter-based Tracking-by-Detection framework. All networks are designed for real-time embedded inference and maintain the modular, interpretable KF structure while replacing traditional prediction and assignment components with data-driven equivalents. Evaluations on the KITTI dataset show SPENT reduces RMSE by about 50% versus a standard KF (to ~0.029), and SANT/MANTa achieve up to 95% association accuracy (with MANTa performing best on 1–6 track scenarios and an average of ~80% across the full dataset). The results demonstrate that task-specific neural modules can boost tracking performance and robustness without sacrificing modularity or maintainability, enabling adaptable ADAS solutions.

Abstract

This paper presents novel Machine Learning (ML) methodologies for Multi-Object Tracking (MOT), specifically designed to meet the increasing complexity and precision demands of Advanced Driver Assistance Systems (ADAS). We introduce three Neural Network (NN) models that address key challenges in MOT: (i) the Single-Prediction Network (SPENT) for trajectory prediction, (ii) the Single-Association Network (SANT) for mapping individual Sensor Object (SO) to existing tracks, and (iii) the Multi-Association Network (MANTa) for associating multiple SOs to multiple tracks. These models are seamlessly integrated into a traditional Kalman Filter (KF) framework, maintaining the system's modularity by replacing relevant components without disrupting the overall architecture. Importantly, all three networks are designed to be run in a realtime, embedded environment. Each network contains less than 50k trainable parameters. Our evaluation, conducted on the public KITTI tracking dataset, demonstrates significant improvements in tracking performance. SPENT reduces the Root Mean Square Error (RMSE) by 50% compared to a standard KF, while SANT and MANTa achieve up to 95% accuracy in sensor object-to-track assignments. These results underscore the effectiveness of incorporating task-specific NNs into traditional tracking systems, boosting performance and robustness while preserving modularity, maintainability, and interpretability.

Paper Structure

This paper contains 21 sections, 6 equations, 7 figures, 2 tables.

Figures (7)

  • Figure 1: This schematic representation shows the integration of two NN (highlighted in dark blue) within a TbD framework. The "Association network" can be implemented using either SANT or MANTa. It works in tandem with the prediction network SPENT, which takes tracked objects ${T^c}_{t,1:n}$ and predicts them to the next timestamp as $X_{t,1:n}$. The predicted objects are then associated with sensor observations $Z_{t,1:m}$ by SANT or MANTa.
  • Figure 2: Analysis of sequence padding: unsorted vs. sorted data. This figure illustrates the impact of sequence padding on LSTM training based on the sorting of input data. The upper panel shows that unsorted data requires extensive padding to equalize batch sequence lengths, increasing computational overhead. In contrast, the lower panel demonstrates that sorting data by length before batching significantly reduces the necessary padding.
  • Figure 3: Schematic representation of the generic structure of SPENT.
  • Figure 4: Schematic representation of the network structure of SANT. Here $m=1$, so one SO is associated to $m$ existing tracks.
  • Figure 5: MANTa, data structure, shows the non-noisy SO to enable a visual assignment and increase understanding of the association procedure. Seven tracks are extracted from the KITTI dataset for the given timestamp of sequence 20. Eight sensor objects are generated in pseudo-random order. The one-hot vector shows the GT assignment of the first sensor object to the track at position two.
  • ...and 2 more figures