Table of Contents
Fetching ...

Digital twins to alleviate the need for real field data in vision-based vehicle speed detection systems

Antonio Hernández Martínez, Iván García Daza, Carlos Fernández López, David Fernández Llorca

TL;DR

This work tackles the data bottleneck in vision-based vehicle speed estimation by constructing a digital twin of a real camera in the CARLA simulator to generate a large, diverse synthetic dataset. A ResNet-3D speed regressor is trained on this digital twin and evaluated on real urban sequences, with results showing that near-perfect camera pose alignment is critical for successful transfer. The digital-twin approach achieves a real-data MAE as low as ~2.66 km/h (Digital Twin Large), significantly improving over other synthetic baselines and eliminating the need for extensive real-field data. The findings highlight the practical potential of digital twins to enable cost-efficient, robust speed estimation for traffic enforcement and safety applications, while outlining future work on dataset expansion and tracking improvements.

Abstract

Accurate vision-based speed estimation is much more cost-effective than traditional methods based on radar or LiDAR. However, it is also challenging due to the limitations of perspective projection on a discrete sensor, as well as the high sensitivity to calibration, lighting and weather conditions. Interestingly, deep learning approaches (which dominate the field of computer vision) are very limited in this context due to the lack of available data. Indeed, obtaining video sequences of real road traffic with accurate speed values associated with each vehicle is very complex and costly, and the number of available datasets is very limited. Recently, some approaches are focusing on the use of synthetic data. However, it is still unclear how models trained on synthetic data can be effectively applied to real world conditions. In this work, we propose the use of digital-twins using CARLA simulator to generate a large dataset representative of a specific real-world camera. The synthetic dataset contains a large variability of vehicle types, colours, speeds, lighting and weather conditions. A 3D CNN model is trained on the digital twin and tested on the real sequences. Unlike previous approaches that generate multi-camera sequences, we found that the gap between the the real and the virtual conditions is a key factor in obtaining low speed estimation errors. Even with a preliminary approach, the mean absolute error obtained remains below 3km/h.

Digital twins to alleviate the need for real field data in vision-based vehicle speed detection systems

TL;DR

This work tackles the data bottleneck in vision-based vehicle speed estimation by constructing a digital twin of a real camera in the CARLA simulator to generate a large, diverse synthetic dataset. A ResNet-3D speed regressor is trained on this digital twin and evaluated on real urban sequences, with results showing that near-perfect camera pose alignment is critical for successful transfer. The digital-twin approach achieves a real-data MAE as low as ~2.66 km/h (Digital Twin Large), significantly improving over other synthetic baselines and eliminating the need for extensive real-field data. The findings highlight the practical potential of digital twins to enable cost-efficient, robust speed estimation for traffic enforcement and safety applications, while outlining future work on dataset expansion and tracking improvements.

Abstract

Accurate vision-based speed estimation is much more cost-effective than traditional methods based on radar or LiDAR. However, it is also challenging due to the limitations of perspective projection on a discrete sensor, as well as the high sensitivity to calibration, lighting and weather conditions. Interestingly, deep learning approaches (which dominate the field of computer vision) are very limited in this context due to the lack of available data. Indeed, obtaining video sequences of real road traffic with accurate speed values associated with each vehicle is very complex and costly, and the number of available datasets is very limited. Recently, some approaches are focusing on the use of synthetic data. However, it is still unclear how models trained on synthetic data can be effectively applied to real world conditions. In this work, we propose the use of digital-twins using CARLA simulator to generate a large dataset representative of a specific real-world camera. The synthetic dataset contains a large variability of vehicle types, colours, speeds, lighting and weather conditions. A 3D CNN model is trained on the digital twin and tested on the real sequences. Unlike previous approaches that generate multi-camera sequences, we found that the gap between the the real and the virtual conditions is a key factor in obtaining low speed estimation errors. Even with a preliminary approach, the mean absolute error obtained remains below 3km/h.
Paper Structure (13 sections, 1 equation, 10 figures, 3 tables)

This paper contains 13 sections, 1 equation, 10 figures, 3 tables.

Figures (10)

  • Figure 1: Overview of the presented approach. A digital twin is generated from the camera calibration parameters using CARLA simulator carla2017. Then, a synthetic dataset is created and used to train and validate a 3D CNN model, which is directy used for speed detection in the real world.
  • Figure 2: First row: Real sequence. Second row: Simulated sequence. Third row: Overlapped sequences.
  • Figure 3: Dimensions of the inductive loops.
  • Figure 4: Digital Twin after the calibration process.
  • Figure 5: Overall view of the 3D CNN network architecture. The 4D input tensor contains a sequence of 16 RGB frames (3$\times$16) of 1.03 seconds duration, with image size of 112$\times$112 pixels.
  • ...and 5 more figures