Table of Contents
Fetching ...

A Style-Based Profiling Framework for Quantifying the Synthetic-to-Real Gap in Autonomous Driving Datasets

Dingyi Yao, Xinyao Han, Ruibo Ming, Zhihang Song, Lihui Peng, Jianming Hu, Danya Yao, Yi Zhang

TL;DR

This work tackles the synthetic-to-real gap in autonomous driving by introducing a style-based profiling framework that separates image content from style using Gram-matrix representations. The Style Embedding Distribution Discrepancy (SEDD) metric combines a shallow-layer feature extractor, Gram-based style embeddings, and metric learning with Center Loss and NTXent, producing two gap measures $SEDD_1$ and $SEDD_2$\text{ for robust, distribution-aware quantification}. Through a public benchmark with real and synthetic datasets (e.g., KITTI, Cityscapes, VKITTI1/2) and sim-to-real methods, the authors demonstrate that SEDD can distinguish realism across datasets, outperform No-Reference IQA baselines, and track improvements from photorealism enhancements. The framework operates as a standardized quality-control tool for synthetic data, enabling targeted augmentation and refinement to better support data-driven autonomous driving systems. This approach advances data-centric evaluation by providing objective, interpretable metrics for synthetic-to-real transfer and generalization.

Abstract

Ensuring the reliability of autonomous driving perception systems requires extensive environment-based testing, yet real-world execution is often impractical. Synthetic datasets have therefore emerged as a promising alternative, offering advantages such as cost-effectiveness, bias free labeling, and controllable scenarios. However, the domain gap between synthetic and real-world datasets remains a major obstacle to model generalization. To address this challenge from a data-centric perspective, this paper introduces a profile extraction and discovery framework for characterizing the style profiles underlying both synthetic and real image datasets. We propose Style Embedding Distribution Discrepancy (SEDD) as a novel evaluation metric. Our framework combines Gram matrix-based style extraction with metric learning optimized for intra-class compactness and inter-class separation to extract style embeddings. Furthermore, we establish a benchmark using publicly available datasets. Experiments are conducted on a variety of datasets and sim-to-real methods, and the results show that our method is capable of quantifying the synthetic-to-real gap. This work provides a standardized profiling-based quality control paradigm that enables systematic diagnosis and targeted enhancement of synthetic datasets, advancing future development of data-driven autonomous driving systems.

A Style-Based Profiling Framework for Quantifying the Synthetic-to-Real Gap in Autonomous Driving Datasets

TL;DR

This work tackles the synthetic-to-real gap in autonomous driving by introducing a style-based profiling framework that separates image content from style using Gram-matrix representations. The Style Embedding Distribution Discrepancy (SEDD) metric combines a shallow-layer feature extractor, Gram-based style embeddings, and metric learning with Center Loss and NTXent, producing two gap measures and \text{ for robust, distribution-aware quantification}. Through a public benchmark with real and synthetic datasets (e.g., KITTI, Cityscapes, VKITTI1/2) and sim-to-real methods, the authors demonstrate that SEDD can distinguish realism across datasets, outperform No-Reference IQA baselines, and track improvements from photorealism enhancements. The framework operates as a standardized quality-control tool for synthetic data, enabling targeted augmentation and refinement to better support data-driven autonomous driving systems. This approach advances data-centric evaluation by providing objective, interpretable metrics for synthetic-to-real transfer and generalization.

Abstract

Ensuring the reliability of autonomous driving perception systems requires extensive environment-based testing, yet real-world execution is often impractical. Synthetic datasets have therefore emerged as a promising alternative, offering advantages such as cost-effectiveness, bias free labeling, and controllable scenarios. However, the domain gap between synthetic and real-world datasets remains a major obstacle to model generalization. To address this challenge from a data-centric perspective, this paper introduces a profile extraction and discovery framework for characterizing the style profiles underlying both synthetic and real image datasets. We propose Style Embedding Distribution Discrepancy (SEDD) as a novel evaluation metric. Our framework combines Gram matrix-based style extraction with metric learning optimized for intra-class compactness and inter-class separation to extract style embeddings. Furthermore, we establish a benchmark using publicly available datasets. Experiments are conducted on a variety of datasets and sim-to-real methods, and the results show that our method is capable of quantifying the synthetic-to-real gap. This work provides a standardized profiling-based quality control paradigm that enables systematic diagnosis and targeted enhancement of synthetic datasets, advancing future development of data-driven autonomous driving systems.

Paper Structure

This paper contains 25 sections, 13 equations, 4 figures, 6 tables.

Figures (4)

  • Figure 1: Schematic of our Style Embedding Distribution Discrepancy (SEDD). Conventional feature distributions are influenced by image content, separating different real datasets while clustering real datasets with their cloned synthetic counterparts. Our approach disentangles content and style, distinguishes between real and synthetic datasets, and measures the synthetic-to-real gap.
  • Figure 2: The overall profiling framework. The model takes as input real images, synthetic images under various weather conditions. The input images are sequentially processed through three key modules: feature extractor, style extractor, and metric learning module. During training, the model parameters are optimized using a combination of Center Loss and NTXent Loss. In the evaluation phase, the learned feature embeddings are post-processed to compute the final SEDD metric.
  • Figure 3: Visualization of feature map from different layers of ResNet. Blue points, orange points, and red points respectively represent samples from the KITTI dataset, Virtual KITTI dataset, and Virtual KITTI 2 dataset. Different shapes (circles, triangles, and crosses) denote samples under different weather conditions in the synthetic datasets.
  • Figure 4: Visualization of results on validation set and for sim-to‑real methods. The result visually demonstrates that our profiling framework can distinguish Virtual KITTI dataset from Virtual KITTI 2 dataset. And the samples after sim-to-real are closer to the real ones.