Detecting Systematic Weaknesses in Vision Models along Predefined Human-Understandable Dimensions
Sujan Sai Gannamaneni, Rohil Prakash Rao, Michael Mock, Maram Akila, Stefan Wrobel
TL;DR
This work presents a Systematic Weakness Detector (SWD) that identifies human-understandable, safety-relevant weaknesses in vision models by generating semantic metadata with foundation models (e.g., CLIP) and performing a noise-aware slice discovery via SliceLine. A Bayesian framework corrects for labeling noise in the metadata, producing corrected slice errors that enable more accurate identification of weak data slices aligned with predefined ODD dimensions. Across synthetic and real-world vision tasks (including CelebA and pedestrian detection datasets), SWD-3 (the noise-corrected fusion) reliably recovers ground-truth weaknesses and yields more actionable insights than metadata-free SOTA methods like DOMINO, Spotlight, and SVM-FD. The approach supports safety arguments and targeted data acquisition for retraining, contributing to trustworthy AI in safety-critical domains like autonomous driving and surveillance.
Abstract
Slice discovery methods (SDMs) are prominent algorithms for finding systematic weaknesses in DNNs. They identify top-k semantically coherent slices/subsets of data where a DNN-under-test has low performance. For being directly useful, slices should be aligned with human-understandable and relevant dimensions, which, for example, are defined by safety and domain experts as part of the operational design domain (ODD). While SDMs can be applied effectively on structured data, their application on image data is complicated by the lack of semantic metadata. To address these issues, we present an algorithm that combines foundation models for zero-shot image classification to generate semantic metadata with methods for combinatorial search to find systematic weaknesses in images. In contrast to existing approaches, ours identifies weak slices that are in line with pre-defined human-understandable dimensions. As the algorithm includes foundation models, its intermediate and final results may not always be exact. Therefore, we include an approach to address the impact of noisy metadata. We validate our algorithm on both synthetic and real-world datasets, demonstrating its ability to recover human-understandable systematic weaknesses. Furthermore, using our approach, we identify systematic weaknesses of multiple pre-trained and publicly available state-of-the-art computer vision DNNs.
