Table of Contents
Fetching ...

UrbanTwin: Synthetic LiDAR Datasets (LUMPI, V2X-Real-IC, and TUMTraf-I)

Muhammad Shahbaz, Shaurya Agarwal

TL;DR

UrbanTwin addresses the data bottleneck in roadside lidar perception by creating high-fidelity synthetic replicas of established benchmarks through digital-twin modeling. The three UT datasets replicate real-world geometry, sensor specs, and traffic dynamics to achieve strong structural and distributional alignment with real data. Experiments show models trained only on synthetic data can generalize to real scenes, sometimes outperforming models trained on real data, offering scalable augmentation and potential cost savings. The work paves the way for broader sim-to-real research in ITS, with future plans for pedestrian/VRU inclusion and cross-site validation.

Abstract

This article presents UrbanTwin datasets, high-fidelity, realistic replicas of three public roadside lidar datasets: LUMPI, V2X-Real-IC}}, and TUMTraf-I. Each UrbanTwin dataset contains 10K annotated frames corresponding to one of the public datasets. Annotations include 3D bounding boxes, instance segmentation labels, and tracking IDs for six object classes, along with semantic segmentation labels for nine classes. These datasets are synthesized using emulated lidar sensors within realistic digital twins, modeled based on surrounding geometry, road alignment at lane level, and the lane topology and vehicle movement patterns at intersections of the actual locations corresponding to each real dataset. Due to the precise digital twin modeling, the synthetic datasets are well aligned with their real counterparts, offering strong standalone and augmentative value for training deep learning models on tasks such as 3D object detection, tracking, and semantic and instance segmentation. We evaluate the alignment of the synthetic replicas through statistical and structural similarity analysis with real data, and further demonstrate their utility by training 3D object detection models solely on synthetic data and testing them on real, unseen data. The high similarity scores and improved detection performance, compared to the models trained on real data, indicate that the UrbanTwin datasets effectively enhance existing benchmark datasets by increasing sample size and scene diversity. In addition, the digital twins can be adapted to test custom scenarios by modifying the design and dynamics of the simulations. To our knowledge, these are the first digitally synthesized datasets that can replace in-domain real-world datasets for lidar perception tasks. UrbanTwin datasets are publicly available at https://dataverse.harvard.edu/dataverse/ucf-ut.

UrbanTwin: Synthetic LiDAR Datasets (LUMPI, V2X-Real-IC, and TUMTraf-I)

TL;DR

UrbanTwin addresses the data bottleneck in roadside lidar perception by creating high-fidelity synthetic replicas of established benchmarks through digital-twin modeling. The three UT datasets replicate real-world geometry, sensor specs, and traffic dynamics to achieve strong structural and distributional alignment with real data. Experiments show models trained only on synthetic data can generalize to real scenes, sometimes outperforming models trained on real data, offering scalable augmentation and potential cost savings. The work paves the way for broader sim-to-real research in ITS, with future plans for pedestrian/VRU inclusion and cross-site validation.

Abstract

This article presents UrbanTwin datasets, high-fidelity, realistic replicas of three public roadside lidar datasets: LUMPI, V2X-Real-IC}}, and TUMTraf-I. Each UrbanTwin dataset contains 10K annotated frames corresponding to one of the public datasets. Annotations include 3D bounding boxes, instance segmentation labels, and tracking IDs for six object classes, along with semantic segmentation labels for nine classes. These datasets are synthesized using emulated lidar sensors within realistic digital twins, modeled based on surrounding geometry, road alignment at lane level, and the lane topology and vehicle movement patterns at intersections of the actual locations corresponding to each real dataset. Due to the precise digital twin modeling, the synthetic datasets are well aligned with their real counterparts, offering strong standalone and augmentative value for training deep learning models on tasks such as 3D object detection, tracking, and semantic and instance segmentation. We evaluate the alignment of the synthetic replicas through statistical and structural similarity analysis with real data, and further demonstrate their utility by training 3D object detection models solely on synthetic data and testing them on real, unseen data. The high similarity scores and improved detection performance, compared to the models trained on real data, indicate that the UrbanTwin datasets effectively enhance existing benchmark datasets by increasing sample size and scene diversity. In addition, the digital twins can be adapted to test custom scenarios by modifying the design and dynamics of the simulations. To our knowledge, these are the first digitally synthesized datasets that can replace in-domain real-world datasets for lidar perception tasks. UrbanTwin datasets are publicly available at https://dataverse.harvard.edu/dataverse/ucf-ut.

Paper Structure

This paper contains 10 sections, 10 figures, 1 table.

Figures (10)

  • Figure 1: Lidar data synthesis from realistic digital twins yields close-to-real data. Left: A point cloud frame from real-world V2X-Real xiang2024v2x dataset. Right: A digital-twin-synthesized point cloud for the same dataset. Can you notice a high similarity?
  • Figure 2: UrbanTwin datasets support all four major tasks for lidar-based perception. They provide 3D bounding boxes for object detection, object IDs for tracking, and KITTI geiger2013vision style labels for both semantic-level and instance-level segmentation. Each frame includes up to 6 object classes and up to 9 semantic-level categories.
  • Figure 3: Overview of Synthetic Dataset Generation. Top: First, a 3D model of a real-world scene is created utilizing publicly available information, including road network information from OpenStreetMaps OpenStreetMap, satellite imagery, etc. Then, the model is fine-aligned to the real point cloud frame, followed by embedding positional information of the lidar sensors. Finally, the road is constructed. Bottom: Simulation is modeled using real-world dynamics of the traffic, followed by importing all the assets to a custom CARLA map, and traffic is generated stochastically but conforms statistically to the target real dataset. Finally, the point cloud data along with label information is stored.
  • Figure 4: A Qualitative Comparison of Real vs. Synthetic Data. A visual resemblance can be noticed in point clouds generated through digital-twin based simulations, to the real point clouds gathered via real lidar sensors.
  • Figure 5: Normalized Frame-Level Means for 4 Key Metrics. Left: LUMPI Dataset, Middle: V2X-Real, Right: TUMTraf-I. The synthetic datasets are carefully generated to match points per frame and boxes (objects) per frame. However, to make the class distribution more even but still comparable to the original datasets' classes, synthetic datasets contain more categories of objects per frame.
  • ...and 5 more figures