Synscapes: A Photorealistic Synthetic Dataset for Street Scene Parsing
Magnus Wrenninge, Jonas Unger
TL;DR
Synscapes addresses how photorealism and controlled parameter variation in synthetic street-scene data affect training, validation, and analysis of vision systems. It introduces a procedurally generated, 25k-image dataset with photorealistic rendering, Cityscapes-compatible annotations, and rich metadata, enabling de-correlated sampling and metadata-driven experiments. Across semantic segmentation and object detection, Synscapes demonstrates stronger transferability and more informative analyses than prior synthetic datasets, aided by high realism and comprehensive annotations. The work highlights realism as a key factor in synthetic-data utility and shows how detailed metadata can reveal performance biases and guide sensor-simulation improvements, with future work aimed at refining realism metrics and expanding analysis capabilities.
Abstract
We introduce Synscapes -- a synthetic dataset for street scene parsing created using photorealistic rendering techniques, and show state-of-the-art results for training and validation as well as new types of analysis. We study the behavior of networks trained on real data when performing inference on synthetic data: a key factor in determining the equivalence of simulation environments. We also compare the behavior of networks trained on synthetic data and evaluated on real-world data. Additionally, by analyzing pre-trained, existing segmentation and detection models, we illustrate how uncorrelated images along with a detailed set of annotations open up new avenues for analysis of computer vision systems, providing fine-grain information about how a model's performance changes according to factors such as distance, occlusion and relative object orientation.
