Table of Contents
Fetching ...

UrbanTwin: Building High-Fidelity Digital Twins for Sim2Real LiDAR Perception and Evaluation

Muhammad Shahbaz, Shaurya Agarwal

TL;DR

The paper tackles the high cost and limited realism of LiDAR datasets for ITS by presenting a reproducible workflow to build High-Fidelity Digital Twins (HiFi DTs) anchored to real geospatial data and integrated with CARLA and RoadRunner. It details a five-stage methodology—site analysis, geometric reconstruction, and simulation-ready map modeling among them—and demonstrates fidelity improvements via quantitative metrics, generating synthetic LiDAR datasets (UT-LUMPI, UT-V2X-Real, UT-TUMTraf-I) that closely match real-world distributions and can outperform real-data baselines in perception tasks. The work also outlines practical AI applications, including scalable supervision, rare-event stress testing, sensor co-design, and privacy-preserving benchmarking, illustrating the broad utility of HiFi DTs in LiDAR perception research. By providing a modular, reproducible pipeline, the approach enables widespread adoption of Sim2Real learning in ITS and supports robust, scalable, and scenario-rich data generation for training and evaluation.

Abstract

LiDAR-based perception in intelligent transportation systems (ITS) relies on deep neural networks trained with large-scale labeled datasets. However, creating such datasets is expensive, time-consuming, and labor-intensive, limiting the scalability of perception systems. Sim2Real learning offers a scalable alternative, but its success depends on the simulation's fidelity to real-world environments, dynamics, and sensors. This tutorial introduces a reproducible workflow for building high-fidelity digital twins (HiFi DTs) to generate realistic synthetic datasets. We outline practical steps for modeling static geometry, road infrastructure, and dynamic traffic using open-source resources such as satellite imagery, OpenStreetMap, and sensor specifications. The resulting environments support scalable and cost-effective data generation for robust Sim2Real learning. Using this workflow, we have released three synthetic LiDAR datasets, namely UT-LUMPI, UT-V2X-Real, and UT-TUMTraf-I, which closely replicate real locations and outperform real-data-trained baselines in perception tasks. This guide enables broader adoption of HiFi DTs in ITS research and deployment.

UrbanTwin: Building High-Fidelity Digital Twins for Sim2Real LiDAR Perception and Evaluation

TL;DR

The paper tackles the high cost and limited realism of LiDAR datasets for ITS by presenting a reproducible workflow to build High-Fidelity Digital Twins (HiFi DTs) anchored to real geospatial data and integrated with CARLA and RoadRunner. It details a five-stage methodology—site analysis, geometric reconstruction, and simulation-ready map modeling among them—and demonstrates fidelity improvements via quantitative metrics, generating synthetic LiDAR datasets (UT-LUMPI, UT-V2X-Real, UT-TUMTraf-I) that closely match real-world distributions and can outperform real-data baselines in perception tasks. The work also outlines practical AI applications, including scalable supervision, rare-event stress testing, sensor co-design, and privacy-preserving benchmarking, illustrating the broad utility of HiFi DTs in LiDAR perception research. By providing a modular, reproducible pipeline, the approach enables widespread adoption of Sim2Real learning in ITS and supports robust, scalable, and scenario-rich data generation for training and evaluation.

Abstract

LiDAR-based perception in intelligent transportation systems (ITS) relies on deep neural networks trained with large-scale labeled datasets. However, creating such datasets is expensive, time-consuming, and labor-intensive, limiting the scalability of perception systems. Sim2Real learning offers a scalable alternative, but its success depends on the simulation's fidelity to real-world environments, dynamics, and sensors. This tutorial introduces a reproducible workflow for building high-fidelity digital twins (HiFi DTs) to generate realistic synthetic datasets. We outline practical steps for modeling static geometry, road infrastructure, and dynamic traffic using open-source resources such as satellite imagery, OpenStreetMap, and sensor specifications. The resulting environments support scalable and cost-effective data generation for robust Sim2Real learning. Using this workflow, we have released three synthetic LiDAR datasets, namely UT-LUMPI, UT-V2X-Real, and UT-TUMTraf-I, which closely replicate real locations and outperform real-data-trained baselines in perception tasks. This guide enables broader adoption of HiFi DTs in ITS research and deployment.

Paper Structure

This paper contains 9 sections, 8 figures.

Figures (8)

  • Figure 1: An overview of the Framework for High-Fidelity Digital-Twin Modeling
  • Figure 2: Left: A raw reconstruction of 3D mesh of real location. Right: The reconstruction is cleaned in blender to region of interest.
  • Figure 3: Left: The sub-figure shows that roads are designed in Roadrunner constrained on lane-level elevation and superelevation. Middle: Special care is taken in modeling road network and traffic maneuvers at intersections. Right: The digital-twin is made close loop so traffic distribution remains fixed in the region.
  • Figure 4: Left: A perspective view of HiFi DT in CARLA. Middle: Top view of the same DT. The red lines are the traffic paths. The DT is constrcuted in loop to restrain the actor distribution in the simulation. Right: A close-up on how traffic maneuvers are constructed at intersection. The small black spheres show traffic spawn points.
  • Figure 5: Snapshots of simulation in action.Left: The vehicles at intersection. It can be observed that traffic is following signal routine created in RoadRunner. Middle: A scene from the road showing mixed traffic (bikes, cars, trucks). Right: Top view of entire traffic scene. The loop in the DT makes it easier to spawn the traffic once and then record data without respawning.
  • ...and 3 more figures