Using Images from a Video Game to Improve the Detection of Truck Axles
Leandro Arab Marcomini, Andre Luiz Cunha
TL;DR
The study addresses the data bottleneck in training object detectors for truck-axle detection by evaluating synthetic images sourced from a video game against real-world images. It trains 27 YOLO-based networks across three databases (real, synthetic, mixed) and a real testing set, using recall, precision, F1-score, and mAP, with Mann-Whitney U tests for statistical significance. Results show high detection performance with synthetic and mixed data, with mAP reaching up to 99% in some configurations, and no significant differences across datasets or YOLO versions, supporting cross-domain applicability. The work demonstrates that realistic video game imagery can serve as a low-cost, scalable data source for domain-specific axle detection, enabling safer and more diverse training data while maintaining performance.
Abstract
Convolutional Neural Networks (CNNs) traditionally require large amounts of data to train models with good performance. However, data collection is an expensive process, both in time and resources. Generated synthetic images are a good alternative, with video games producing realistic 3D models. This paper aims to determine whether images extracted from a video game can be effectively used to train a CNN to detect real-life truck axles. Three different databases were created, with real-life and synthetic trucks, to provide training and testing examples for three different You Only Look Once (YOLO) architectures. Results were evaluated based on four metrics: recall, precision, F1-score, and mean Average Precision (mAP). To evaluate the statistical significance of the results, the Mann-Whitney U test was also applied to the resulting mAP of all models. Synthetic images from trucks extracted from a video game proved to be a reliable source of training data, contributing to the performance of all networks. The highest mAP score reached 99\%. Results indicate that synthetic images can be used to train neural networks, providing a reliable, low-cost data source for extracting knowledge.
