Table of Contents
Fetching ...

All for One, and One for All: UrbanSyn Dataset, the third Musketeer of Synthetic Driving Scenes

Jose L. Gómez, Manuel Silva, Antonio Seoane, Agnès Borrás, Mario Noriega, Germán Ros, Jose A. Iglesias-Guitian, Antonio M. López

TL;DR

UrbanSyn tackles the data-labeling bottleneck in autonomous driving by delivering photorealistic synthetic urban scenes with pixel-perfect ground truth, including depth, semantics, and instances. Generated with unbiased path tracing and GIS-informed layouts, UrbanSyn complements GTAV and Synscapes as part of The Three Musketeers to close the synth-to-real gap via multisource unsupervised domain adaptation. Across Cityscapes, BDD100K, and Mapillary Vistas, UrbanSyn strengthens semantic segmentation baselines and, when combined with the other datasets, achieves state-of-the-art synth-to-real UDA performance using HRDA and co-training. The dataset is openly accessible, enabling broad reuse for tasks such as instance segmentation and depth estimation, and guiding future work in active learning and data-centric AI for autonomous driving.

Abstract

We introduce UrbanSyn, a photorealistic dataset acquired through semi-procedurally generated synthetic urban driving scenarios. Developed using high-quality geometry and materials, UrbanSyn provides pixel-level ground truth, including depth, semantic segmentation, and instance segmentation with object bounding boxes and occlusion degree. It complements GTAV and Synscapes datasets to form what we coin as the 'Three Musketeers'. We demonstrate the value of the Three Musketeers in unsupervised domain adaptation for image semantic segmentation. Results on real-world datasets, Cityscapes, Mapillary Vistas, and BDD100K, establish new benchmarks, largely attributed to UrbanSyn. We make UrbanSyn openly and freely accessible (www.urbansyn.org).

All for One, and One for All: UrbanSyn Dataset, the third Musketeer of Synthetic Driving Scenes

TL;DR

UrbanSyn tackles the data-labeling bottleneck in autonomous driving by delivering photorealistic synthetic urban scenes with pixel-perfect ground truth, including depth, semantics, and instances. Generated with unbiased path tracing and GIS-informed layouts, UrbanSyn complements GTAV and Synscapes as part of The Three Musketeers to close the synth-to-real gap via multisource unsupervised domain adaptation. Across Cityscapes, BDD100K, and Mapillary Vistas, UrbanSyn strengthens semantic segmentation baselines and, when combined with the other datasets, achieves state-of-the-art synth-to-real UDA performance using HRDA and co-training. The dataset is openly accessible, enabling broad reuse for tasks such as instance segmentation and depth estimation, and guiding future work in active learning and data-centric AI for autonomous driving.

Abstract

We introduce UrbanSyn, a photorealistic dataset acquired through semi-procedurally generated synthetic urban driving scenarios. Developed using high-quality geometry and materials, UrbanSyn provides pixel-level ground truth, including depth, semantic segmentation, and instance segmentation with object bounding boxes and occlusion degree. It complements GTAV and Synscapes datasets to form what we coin as the 'Three Musketeers'. We demonstrate the value of the Three Musketeers in unsupervised domain adaptation for image semantic segmentation. Results on real-world datasets, Cityscapes, Mapillary Vistas, and BDD100K, establish new benchmarks, largely attributed to UrbanSyn. We make UrbanSyn openly and freely accessible (www.urbansyn.org).
Paper Structure (18 sections, 11 figures, 6 tables)

This paper contains 18 sections, 11 figures, 6 tables.

Figures (11)

  • Figure 1: UrbanSyn covers $4$ different ODDs. It supports different lighting conditions, enabling atmospheric participating media (e.g., columns $3$ and $4$), procedurally generating different building landscapes, and shuffling locations and materials of assets. From top to bottom, we see a top view of the ODDs, RGB images captured around the indicated circles within the ODDs, corresponding depth maps (pseudo-color), object instances with their bounding boxes, and per-pixel class semantic labels.
  • Figure 2: UrbanSyn content generation pipeline overview. Our builder scripts combine real GIS data with procedurally generated content and a 3D library of custom and commercial assets to create layers containing configurable content variations.
  • Figure 3: Rendering results using unbiased path tracing (PT) with different sample counts. We set the adaptive sampling to use a maximum of $256$ spp. Adaptive sampling and denoising help reduce undesired PT noise and large rendering times.
  • Figure 4: Content statistics for the Musketeers dataset: (Top) Per-class pixel-occupancy distributions. Apparently, the three datasets show relatively similar distributions, sharing more than what could set them apart. (Bottom) Percentage of images containing samples of the given class. UrbanSyn is on pair with Synscapes for certain classes, where both provide more examples than GTAV (e.g., Bus and Rider).
  • Figure 5: Experimental evaluation methodology for synth-to-real UDA procedures. Please, refer to the main text in section \ref{['ssec:expemetho']} for details.
  • ...and 6 more figures