How does Simulation-based Testing for Self-driving Cars match Human Perception?

Christian Birchler; Tanzil Kombarabettu Mohammed; Pooja Rani; Teodora Nechita; Timo Kehrer; Sebastiano Panichella

How does Simulation-based Testing for Self-driving Cars match Human Perception?

Christian Birchler, Tanzil Kombarabettu Mohammed, Pooja Rani, Teodora Nechita, Timo Kehrer, Sebastiano Panichella

TL;DR

The paper tackles a critical gap in self-driving car testing: whether the widely used out-of-bound ($O_{OB}$) safety metric aligns with human perceptions of safety and realism in simulated scenarios. It introduces SDC-Alabaster, a VR-enabled Human-in-the-Loop framework, and evaluates 50 participants across two leading simulators (BeamNG.tech and CARLA) to examine how test complexity, interaction, and immersion influence safety judgments and realism. Findings show that human safety perception diverges from $O_{OB}$ in more complex or interactive contexts, and that realism is strongly affected by factors such as viewpoint and environment, necessitating richer, human-aligned safety metrics and realism considerations. These insights underscore the reality-gap problem in simulation-based SDC testing and offer a taxonomy of realism factors to guide future evaluation frameworks and practical testing pipelines.

Abstract

Software metrics such as coverage and mutation scores have been extensively explored for the automated quality assessment of test suites. While traditional tools rely on such quantifiable software metrics, the field of self-driving cars (SDCs) has primarily focused on simulation-based test case generation using quality metrics such as the out-of-bound (OOB) parameter to determine if a test case fails or passes. However, it remains unclear to what extent this quality metric aligns with the human perception of the safety and realism of SDCs, which are critical aspects in assessing SDC behavior. To address this gap, we conducted an empirical study involving 50 participants to investigate the factors that determine how humans perceive SDC test cases as safe, unsafe, realistic, or unrealistic. To this aim, we developed a framework leveraging virtual reality (VR) technologies, called SDC-Alabaster, to immerse the study participants into the virtual environment of SDC simulators. Our findings indicate that the human assessment of the safety and realism of failing and passing test cases can vary based on different factors, such as the test's complexity and the possibility of interacting with the SDC. Especially for the assessment of realism, the participants' age as a confounding factor leads to a different perception. This study highlights the need for more research on SDC simulation testing quality metrics and the importance of human perception in evaluating SDC behavior.

How does Simulation-based Testing for Self-driving Cars match Human Perception?

TL;DR

The paper tackles a critical gap in self-driving car testing: whether the widely used out-of-bound (

) safety metric aligns with human perceptions of safety and realism in simulated scenarios. It introduces SDC-Alabaster, a VR-enabled Human-in-the-Loop framework, and evaluates 50 participants across two leading simulators (BeamNG.tech and CARLA) to examine how test complexity, interaction, and immersion influence safety judgments and realism. Findings show that human safety perception diverges from

in more complex or interactive contexts, and that realism is strongly affected by factors such as viewpoint and environment, necessitating richer, human-aligned safety metrics and realism considerations. These insights underscore the reality-gap problem in simulation-based SDC testing and offer a taxonomy of realism factors to guide future evaluation frameworks and practical testing pipelines.

Abstract

Paper Structure (61 sections, 8 figures, 14 tables)

This paper contains 61 sections, 8 figures, 14 tables.

Introduction
Background
SDC simulators
BeamNG.tech
CARLA
Test generators & Test Runner
Virtual reality
Headset & VR connection with simulation environments
Methodology
Research questions
RQ1: Human-based assessment of safety
RQ2: Impact of human interaction on the assessments of SDCs
RQ3: Human-based assessment of Realism
Design overview
Design implementation
...and 46 more sections

Figures (8)

Figure 1: Examples of simulation-based tests of an SDC.
Figure 2: Examples of unsafe tests with valid OOB criteria
Figure 3: Design overview with survey question IDs from Table \ref{['tab:survey-questions']}
Figure 4: Perceived safety of failing and passing tests grouped by scenario's complexity
Figure 5: VR vs. no VR
...and 3 more figures

How does Simulation-based Testing for Self-driving Cars match Human Perception?

TL;DR

Abstract

How does Simulation-based Testing for Self-driving Cars match Human Perception?

Authors

TL;DR

Abstract

Table of Contents

Figures (8)