Table of Contents
Fetching ...

Using Images from a Video Game to Improve the Detection of Truck Axles

Leandro Arab Marcomini, Andre Luiz Cunha

TL;DR

The study addresses the data bottleneck in training object detectors for truck-axle detection by evaluating synthetic images sourced from a video game against real-world images. It trains 27 YOLO-based networks across three databases (real, synthetic, mixed) and a real testing set, using recall, precision, F1-score, and mAP, with Mann-Whitney U tests for statistical significance. Results show high detection performance with synthetic and mixed data, with mAP reaching up to 99% in some configurations, and no significant differences across datasets or YOLO versions, supporting cross-domain applicability. The work demonstrates that realistic video game imagery can serve as a low-cost, scalable data source for domain-specific axle detection, enabling safer and more diverse training data while maintaining performance.

Abstract

Convolutional Neural Networks (CNNs) traditionally require large amounts of data to train models with good performance. However, data collection is an expensive process, both in time and resources. Generated synthetic images are a good alternative, with video games producing realistic 3D models. This paper aims to determine whether images extracted from a video game can be effectively used to train a CNN to detect real-life truck axles. Three different databases were created, with real-life and synthetic trucks, to provide training and testing examples for three different You Only Look Once (YOLO) architectures. Results were evaluated based on four metrics: recall, precision, F1-score, and mean Average Precision (mAP). To evaluate the statistical significance of the results, the Mann-Whitney U test was also applied to the resulting mAP of all models. Synthetic images from trucks extracted from a video game proved to be a reliable source of training data, contributing to the performance of all networks. The highest mAP score reached 99\%. Results indicate that synthetic images can be used to train neural networks, providing a reliable, low-cost data source for extracting knowledge.

Using Images from a Video Game to Improve the Detection of Truck Axles

TL;DR

The study addresses the data bottleneck in training object detectors for truck-axle detection by evaluating synthetic images sourced from a video game against real-world images. It trains 27 YOLO-based networks across three databases (real, synthetic, mixed) and a real testing set, using recall, precision, F1-score, and mAP, with Mann-Whitney U tests for statistical significance. Results show high detection performance with synthetic and mixed data, with mAP reaching up to 99% in some configurations, and no significant differences across datasets or YOLO versions, supporting cross-domain applicability. The work demonstrates that realistic video game imagery can serve as a low-cost, scalable data source for domain-specific axle detection, enabling safer and more diverse training data while maintaining performance.

Abstract

Convolutional Neural Networks (CNNs) traditionally require large amounts of data to train models with good performance. However, data collection is an expensive process, both in time and resources. Generated synthetic images are a good alternative, with video games producing realistic 3D models. This paper aims to determine whether images extracted from a video game can be effectively used to train a CNN to detect real-life truck axles. Three different databases were created, with real-life and synthetic trucks, to provide training and testing examples for three different You Only Look Once (YOLO) architectures. Results were evaluated based on four metrics: recall, precision, F1-score, and mean Average Precision (mAP). To evaluate the statistical significance of the results, the Mann-Whitney U test was also applied to the resulting mAP of all models. Synthetic images from trucks extracted from a video game proved to be a reliable source of training data, contributing to the performance of all networks. The highest mAP score reached 99\%. Results indicate that synthetic images can be used to train neural networks, providing a reliable, low-cost data source for extracting knowledge.

Paper Structure

This paper contains 13 sections, 14 figures, 6 tables.

Figures (14)

  • Figure 1: Example of synthetic and real-world data. The truck on the top is a rendered 3D model. The truck on the bottom is a real-life truck.
  • Figure 2: CNN architecture, with feature extraction happening on the convolution layers, and classification provided by a fully connected MLP. Source: cnnarq.
  • Figure 3: YOLOv3 layers and architecture, with several convolution layers and different classification scales to detect different-sized objects. Source: adapted from ammar2021vehicle.
  • Figure 4: The process of using neural networks and the importance of data collecting. Source: adapted from whang2023data.
  • Figure 5: 3D model of a truck from the video game Euro Truck Simulator 2.
  • ...and 9 more figures