UrbanTwin: Building High-Fidelity Digital Twins for Sim2Real LiDAR Perception and Evaluation
Muhammad Shahbaz, Shaurya Agarwal
TL;DR
The paper tackles the high cost and limited realism of LiDAR datasets for ITS by presenting a reproducible workflow to build High-Fidelity Digital Twins (HiFi DTs) anchored to real geospatial data and integrated with CARLA and RoadRunner. It details a five-stage methodology—site analysis, geometric reconstruction, and simulation-ready map modeling among them—and demonstrates fidelity improvements via quantitative metrics, generating synthetic LiDAR datasets (UT-LUMPI, UT-V2X-Real, UT-TUMTraf-I) that closely match real-world distributions and can outperform real-data baselines in perception tasks. The work also outlines practical AI applications, including scalable supervision, rare-event stress testing, sensor co-design, and privacy-preserving benchmarking, illustrating the broad utility of HiFi DTs in LiDAR perception research. By providing a modular, reproducible pipeline, the approach enables widespread adoption of Sim2Real learning in ITS and supports robust, scalable, and scenario-rich data generation for training and evaluation.
Abstract
LiDAR-based perception in intelligent transportation systems (ITS) relies on deep neural networks trained with large-scale labeled datasets. However, creating such datasets is expensive, time-consuming, and labor-intensive, limiting the scalability of perception systems. Sim2Real learning offers a scalable alternative, but its success depends on the simulation's fidelity to real-world environments, dynamics, and sensors. This tutorial introduces a reproducible workflow for building high-fidelity digital twins (HiFi DTs) to generate realistic synthetic datasets. We outline practical steps for modeling static geometry, road infrastructure, and dynamic traffic using open-source resources such as satellite imagery, OpenStreetMap, and sensor specifications. The resulting environments support scalable and cost-effective data generation for robust Sim2Real learning. Using this workflow, we have released three synthetic LiDAR datasets, namely UT-LUMPI, UT-V2X-Real, and UT-TUMTraf-I, which closely replicate real locations and outperform real-data-trained baselines in perception tasks. This guide enables broader adoption of HiFi DTs in ITS research and deployment.
