Assessing the Completeness of Traffic Scenario Categories for Automated Highway Driving Functions via Cluster-based Analysis
Niklas Roßberg, Marion Neumeier, Sinan Hasirlioglu, Mohamed Essayed Bouzouraa, Michael Botsch
TL;DR
This paper tackles safety certification for automated highway driving by representing observed traffic as a finite catalog of scenarios with $Q$ categories and by using a Coupon Collectors Problem (CCP) based completeness test with confidence $\\tau$ to determine how much data is needed. It introduces a CVQ-VAE-based clustering pipeline that employs a dynamically updated codebook to generate scenario catalogs and a CCP-based completeness check to quantify data requirements for complete coverage. Evaluated on the highD dataset, the approach achieves improved codebook utilization and lower reconstruction loss as codebook size grows, though gains in classification accuracy do not always scale with more categories. The results reveal a trade-off between the number of scenario categories and data requirements, and point to the need for a robust metric to select the optimal catalog size for ADS validation.
Abstract
The ability to operate safely in increasingly complex traffic scenarios is a fundamental requirement for Automated Driving Systems (ADS). Ensuring the safe release of ADS functions necessitates a precise understanding of the occurring traffic scenarios. To support this objective, this work introduces a pipeline for traffic scenario clustering and the analysis of scenario category completeness. The Clustering Vector Quantized - Variational Autoencoder (CVQ-VAE) is employed for the clustering of highway traffic scenarios and utilized to create various catalogs with differing numbers of traffic scenario categories. Subsequently, the impact of the number of categories on the completeness considerations of the traffic scenario categories is analyzed. The results show an outperforming clustering performance compared to previous work. The trade-off between cluster quality and the amount of required data to maintain completeness is discussed based on the publicly available highD dataset.
