Table of Contents
Fetching ...

Diversity-guided Search Exploration for Self-driving Cars Test Generation through Frenet Space Encoding

Timo Blattner, Christian Birchler, Timo Kehrer, Sebastiano Panichella

TL;DR

The paper tackles safety testing for self-driving cars by addressing the lack of diverse, critical scenarios in field and simulation tests. It introduces Frenilla, a framework that merges a transformer-based discriminator trained to predict out-of-bounds likelihood with a genetic algorithm operating on Frenet-encoded road curves, aiming to produce diverse and valid test cases while preserving fault-detection capability. Through a large-scale empirical study on 1,174 simulated tests in BeamNG.tech, Frenilla demonstrates high validity, substantial fault discovery within a two-hour budget, and reduced time wasted on passing tests relative to prior Frenetic results, with data and code publicly available. This approach highlights the value of learning perceptual safety signals and coupling them with diversity-guided search to enhance scalable, informative SDC test generation for real-world safety assurance.

Abstract

The rise of self-driving cars (SDCs) presents important safety challenges to address in dynamic environments. While field testing is essential, current methods lack diversity in assessing critical SDC scenarios. Prior research introduced simulation-based testing for SDCs, with Frenetic, a test generation approach based on Frenet space encoding, achieving a relatively high percentage of valid tests (approximately 50%) characterized by naturally smooth curves. The "minimal out-of-bound distance" is often taken as a fitness function, which we argue to be a sub-optimal metric. Instead, we show that the likelihood of leading to an out-of-bound condition can be learned by the deep-learning vanilla transformer model. We combine this "inherently learned metric" with a genetic algorithm, which has been shown to produce a high diversity of tests. To validate our approach, we conducted a large-scale empirical evaluation on a dataset comprising over 1,174 simulated test cases created to challenge the SDCs behavior. Our investigation revealed that our approach demonstrates a substantial reduction in generating non-valid test cases, increased diversity, and high accuracy in identifying safety violations during SDC test execution.

Diversity-guided Search Exploration for Self-driving Cars Test Generation through Frenet Space Encoding

TL;DR

The paper tackles safety testing for self-driving cars by addressing the lack of diverse, critical scenarios in field and simulation tests. It introduces Frenilla, a framework that merges a transformer-based discriminator trained to predict out-of-bounds likelihood with a genetic algorithm operating on Frenet-encoded road curves, aiming to produce diverse and valid test cases while preserving fault-detection capability. Through a large-scale empirical study on 1,174 simulated tests in BeamNG.tech, Frenilla demonstrates high validity, substantial fault discovery within a two-hour budget, and reduced time wasted on passing tests relative to prior Frenetic results, with data and code publicly available. This approach highlights the value of learning perceptual safety signals and coupling them with diversity-guided search to enhance scalable, informative SDC test generation for real-world safety assurance.

Abstract

The rise of self-driving cars (SDCs) presents important safety challenges to address in dynamic environments. While field testing is essential, current methods lack diversity in assessing critical SDC scenarios. Prior research introduced simulation-based testing for SDCs, with Frenetic, a test generation approach based on Frenet space encoding, achieving a relatively high percentage of valid tests (approximately 50%) characterized by naturally smooth curves. The "minimal out-of-bound distance" is often taken as a fitness function, which we argue to be a sub-optimal metric. Instead, we show that the likelihood of leading to an out-of-bound condition can be learned by the deep-learning vanilla transformer model. We combine this "inherently learned metric" with a genetic algorithm, which has been shown to produce a high diversity of tests. To validate our approach, we conducted a large-scale empirical evaluation on a dataset comprising over 1,174 simulated test cases created to challenge the SDCs behavior. Our investigation revealed that our approach demonstrates a substantial reduction in generating non-valid test cases, increased diversity, and high accuracy in identifying safety violations during SDC test execution.
Paper Structure (16 sections, 2 figures, 1 table)

This paper contains 16 sections, 2 figures, 1 table.

Figures (2)

  • Figure 1: The road in grey, with the simulation trace in green showing the out-of-bounds conditions in red
  • Figure 2: The discriminator model takes the previous road points (curvature, step size) as input to predict for each point the likelihood of an OOB condition