Background Invariance Testing According to Semantic Proximity
Zukang Liao, Min Chen
TL;DR
The paper addresses background invariance testing in ML by showing that visualization-based analyses reveal differences among models that share the same global statistics. It introduces an association ontology to semantically expand detected keywords and to guide non-uniform background sampling, enabling diverse yet representative test suites. Empirical results demonstrate that keyword-based sampling using the ontology yields the best balance between testing diversity and annotation reliability, and the framework can be automated with around 80% accuracy. This work enhances the reliability and scalability of background invariance testing, supporting more robust deployment in real-world settings.
Abstract
In many applications, machine-learned (ML) models are required to hold some invariance qualities, such as rotation, size, and intensity invariance. Among these, testing for background invariance presents a significant challenge due to the vast and complex data space it encompasses. To evaluate invariance qualities, we first use a visualization-based testing framework which allows human analysts to assess and make informed decisions about the invariance properties of ML models. We show that such informative testing framework is preferred as ML models with the same global statistics (e.g., accuracy scores) can behave differently and have different visualized testing patterns. However, such human analysts might not lead to consistent decisions without a systematic sampling approach to select representative testing suites. In this work, we present a technical solution for selecting background scenes according to their semantic proximity to a target image that contains a foreground object being tested. We construct an ontology for storing knowledge about relationships among different objects using association analysis. This ontology enables an efficient and meaningful search for background scenes of different semantic distances to a target image, enabling the selection of a test suite that is both diverse and reasonable. Compared with other testing techniques, e.g., random sampling, nearest neighbors, or other sampled test suites by visual-language models (VLMs), our method achieved a superior balance between diversity and consistency of human annotations, thereby enhancing the reliability and comprehensiveness of background invariance testing.
