CLIPping the Limits: Finding the Sweet Spot for Relevant Images in Automated Driving Systems Perception Testing

Philipp Rigoll; Laurenz Adolph; Lennart Ries; Eric Sax

CLIPping the Limits: Finding the Sweet Spot for Relevant Images in Automated Driving Systems Perception Testing

Philipp Rigoll, Laurenz Adolph, Lennart Ries, Eric Sax

TL;DR

This work tackles the challenge of assembling semantically relevant image subsets for automated driving perception testing by leveraging CLIP to rank images by query similarity and introducing an automatic threshold to convert ranking into a usable partial dataset. It introduces a two-Gaussian mixture model to describe the cosine-distance distribution and derives the threshold from their intersection, with a single-Gaussian fallback when the mixture fails. The method balances false positives and false negatives and includes a fallback procedure to maintain automation, demonstrated on the ACDC dataset with prompts like snow, fog, rain, and night, plus a fallback analysis on traffic lights. The approach reduces manual curation, enabling scalable, automated testing of perception robustness, while acknowledging prompt-dependent variability and outlining directions for prompt optimization and broader model evaluation.

Abstract

Perception systems, especially cameras, are the eyes of automated driving systems. Ensuring that they function reliably and robustly is therefore an important building block in the automation of vehicles. There are various approaches to test the perception of automated driving systems. Ultimately, however, it always comes down to the investigation of the behavior of perception systems under specific input data. Camera images are a crucial part of the input data. Image data sets are therefore collected for the testing of automated driving systems, but it is non-trivial to find specific images in these data sets. Thanks to recent developments in neural networks, there are now methods for sorting the images in a data set according to their similarity to a prompt in natural language. In order to further automate the provision of search results, we make a contribution by automating the threshold definition in these sorted results and returning only the images relevant to the prompt as a result. Our focus is on preventing false positives and false negatives equally. It is also important that our method is robust and in the case that our assumptions are not fulfilled, we provide a fallback solution.

CLIPping the Limits: Finding the Sweet Spot for Relevant Images in Automated Driving Systems Perception Testing

TL;DR

Abstract

Paper Structure (10 sections, 8 equations, 8 figures, 1 table)

This paper contains 10 sections, 8 equations, 8 figures, 1 table.

INTRODUCTION
RELATED WORK
METHOD
Sum of two Gaussian distributions
Fallback: single Gaussian distribution
EVALUATION
General Functionality
Quantitative Experiments
Fallback Threshold
CONCLUSION

Figures (8)

Figure 1: Schematic illustration of our method for determining a threshold value based on the distribution of the similarity values.
Figure 2: Range of cosine distances of different prompts to all images in the ACDC sakaridis_acdc_2021 data set
Figure 3: Procedure to decide which modeling best fits the distribution of the cosine distances.
Figure 4: Images sorted by their cosine distance to the prompt 'snow', the threshold based on the optimum F1 score, our threshold, and the ground truth as color coding.
Figure 5: Distribution of the cosine distances of the ACDC images to the prompt 'snow' with the fitted Gaussian distributions and associated thresholds.
...and 3 more figures

CLIPping the Limits: Finding the Sweet Spot for Relevant Images in Automated Driving Systems Perception Testing

TL;DR

Abstract

CLIPping the Limits: Finding the Sweet Spot for Relevant Images in Automated Driving Systems Perception Testing

Authors

TL;DR

Abstract

Table of Contents

Figures (8)