Quantifying and Bridging the Fidelity Gap: A Decisive-Feature Approach to Comparing Synthetic and Real Imagery
Danial Safaei, Siddartha Khastgir, Mohsen Alirezaei, Jeroen Ploeg, Son Tong, Xingyu Zhao
TL;DR
This work addresses the sim-to-real fidelity gap in avatar-based autonomous-vehicle testing by introducing Decisive Feature Fidelity (DFF), a SUT-specific measure of mechanism parity that compares the decisive features driving a SUT's decisions in real and synthetic domains using explainable AI. It presents a practical DFF estimator based on counterfactual explanations and a DFF-guided calibration objective that tunes the synthetic data generator to minimize mechanism gaps while preserving output performance. Experiments on 2126 KITTI–VirtualKITTI2 pairs across three SUT heads show that DFF reveals discrepancies not captured by traditional input- or output-focused metrics, and that DFF-guided calibration improves decisive-feature alignment and input fidelity without non-inferiorly degrading task outputs. The approach advocates using DFF alongside conventional fidelity checks to enable more trustworthy virtual testing and more effective simulator calibration.
Abstract
Virtual testing using synthetic data has become a cornerstone of autonomous vehicle (AV) safety assurance. Despite progress in improving visual realism through advanced simulators and generative AI, recent studies reveal that pixel-level fidelity alone does not ensure reliable transfer from simulation to the real world. What truly matters is whether the system-under-test (SUT) bases its decisions on the same causal evidence in both real and simulated environments - not just whether images "look real" to humans. This paper addresses the lack of such a behavior-grounded fidelity measure by introducing Decisive Feature Fidelity (DFF), a new SUT-specific metric that extends the existing fidelity spectrum to capture mechanism parity - the agreement in causal evidence underlying the SUT's decisions across domains. DFF leverages explainable-AI (XAI) methods to identify and compare the decisive features driving the SUT's outputs for matched real-synthetic pairs. We further propose practical estimators based on counterfactual explanations, along with a DFF-guided calibration scheme to enhance simulator fidelity. Experiments on 2126 matched KITTI-VirtualKITTI2 pairs demonstrate that DFF reveals discrepancies overlooked by conventional output-value fidelity. Furthermore, results show that DFF-guided calibration improves decisive-feature and input-level fidelity without sacrificing output value fidelity across diverse SUTs.
