Table of Contents
Fetching ...

Mind the Domain Gap: Measuring the Domain Gap Between Real-World and Synthetic Point Clouds for Automated Driving Development

Nguyen Duc, Yan-Ling Lai, Patrick Madlindl, Xinyuan Zhu, Benedikt Schwab, Olaf Wysocki, Ludwig Hoegner, Thomas H. Kolbe

TL;DR

This work tackles the domain gap between real-world and synthetic point clouds for automated driving by introducing scene-homogeneous evaluation using real urban CityGML/OpenDRIVE models and a harmonized 12-class labeling scheme. It presents a novel DoGSS-PCL metric that jointly quantifies semantic and geometric divergence and complements it with deterministic and stochastic analyses to assess how synthetic data can augment real data. The methodology is operationalized through a CARLA-based simulation pipeline, with a semantic-modeling framework that preserves ground-truth semantics, enabling rigorous gap measurement. Empirical results show that synthetic semantic point clouds can complement real data, achieving comparable performance at a 50/50 real-to-synthetic mix, while highlighting class-dependent gains and losses and guiding data-generation strategies for safe, large-scale automated-driving testing and digital twinning.

Abstract

Owing to the typical long-tail data distribution issues, simulating domain-gap-free synthetic data is crucial in robotics, photogrammetry, and computer vision research. The fundamental challenge pertains to credibly measuring the difference between real and simulated data. Such a measure is vital for safety-critical applications, such as automated driving, where out-of-domain samples may impact a car's perception and cause fatal accidents. Previous work has commonly focused on simulating data on one scene and analyzing performance on a different, real-world scene, hampering the disjoint analysis of domain gap coming from networks' deficiencies, class definitions, and object representation. In this paper, we propose a novel approach to measuring the domain gap between the real world sensor observations and simulated data representing the same location, enabling comprehensive domain gap analysis. To measure such a domain gap, we introduce a novel metric DoGSS-PCL and evaluation assessing the geometric and semantic quality of the simulated point cloud. Our experiments corroborate that the introduced approach can be used to measure the domain gap. The tests also reveal that synthetic semantic point clouds may be used for training deep neural networks, maintaining the performance at the 50/50 real-to-synthetic ratio. We strongly believe that this work will facilitate research on credible data simulation and allow for at-scale deployment in automated driving testing and digital twinning.

Mind the Domain Gap: Measuring the Domain Gap Between Real-World and Synthetic Point Clouds for Automated Driving Development

TL;DR

This work tackles the domain gap between real-world and synthetic point clouds for automated driving by introducing scene-homogeneous evaluation using real urban CityGML/OpenDRIVE models and a harmonized 12-class labeling scheme. It presents a novel DoGSS-PCL metric that jointly quantifies semantic and geometric divergence and complements it with deterministic and stochastic analyses to assess how synthetic data can augment real data. The methodology is operationalized through a CARLA-based simulation pipeline, with a semantic-modeling framework that preserves ground-truth semantics, enabling rigorous gap measurement. Empirical results show that synthetic semantic point clouds can complement real data, achieving comparable performance at a 50/50 real-to-synthetic mix, while highlighting class-dependent gains and losses and guiding data-generation strategies for safe, large-scale automated-driving testing and digital twinning.

Abstract

Owing to the typical long-tail data distribution issues, simulating domain-gap-free synthetic data is crucial in robotics, photogrammetry, and computer vision research. The fundamental challenge pertains to credibly measuring the difference between real and simulated data. Such a measure is vital for safety-critical applications, such as automated driving, where out-of-domain samples may impact a car's perception and cause fatal accidents. Previous work has commonly focused on simulating data on one scene and analyzing performance on a different, real-world scene, hampering the disjoint analysis of domain gap coming from networks' deficiencies, class definitions, and object representation. In this paper, we propose a novel approach to measuring the domain gap between the real world sensor observations and simulated data representing the same location, enabling comprehensive domain gap analysis. To measure such a domain gap, we introduce a novel metric DoGSS-PCL and evaluation assessing the geometric and semantic quality of the simulated point cloud. Our experiments corroborate that the introduced approach can be used to measure the domain gap. The tests also reveal that synthetic semantic point clouds may be used for training deep neural networks, maintaining the performance at the 50/50 real-to-synthetic ratio. We strongly believe that this work will facilitate research on credible data simulation and allow for at-scale deployment in automated driving testing and digital twinning.

Paper Structure

This paper contains 36 sections, 10 equations, 15 figures, 9 tables.

Figures (15)

  • Figure 1: Unlike the established approaches measuring the domain gap on fictive 3D models and different, real-world locations (top branch); we propose leveraging 3D models representing real-world cities and corresponding real-world point clouds for this purpose offering scene-homogeneous geometric and semantic domain gap measure (bottom branch).
  • Figure 2: Overview of our domain gap measure workflow. We leverage the real-world point clouds and manually created semantic 3D urban models to identify deterministically (\ref{['sec:deterministicApproach']}) and stochastically (\ref{['sec:stochasticApproach']}) the point clouds domain gap. We also propose unified semantic labels for both 3D-model-simulated and real-world point clouds in accordance with international 3D modeling standards (\ref{['subsec:semanticLabelsMapping']}).
  • Figure 3: Real-world point cloud, which was manually labeled according to the class list of \ref{['tab:class-list']}.
  • Figure 4: Developed model processing chain that enables a scanning simulation with automatically assigned labels according to our proposed class list. The road network standard OpenDRIVE and the semantic 3D city model standard CityGML serve as the basis for deriving application-specific mesh geometries.
  • Figure 5: Simulated test drive in the virtual environment with the automatically labeled point cloud after applying noise in the post-processing. The virtual vehicle was elevated to match the sensor positions of the real-world surveying vehicle.
  • ...and 10 more figures