Noise or Signal: The Role of Image Backgrounds in Object Recognition
Kai Xiao, Logan Engstrom, Andrew Ilyas, Aleksander Madry
TL;DR
This work exposes the extent to which image backgrounds drive object recognition by constructing a foreground-background disentanglement toolkit and a family of IN-9 datasets (including the larger IN-9L). It demonstrates that backgrounds can carry substantial predictive signals, that models are vulnerable to adversarial backgrounds, and that training on mixed-background data reduces reliance on backgrounds while preserving accuracy on real-world data. The authors also analyze how progress on standard benchmarks relates to background dependence and discuss possible robustness strategies, such as distributionally robust optimization. Overall, the study provides a nuanced view of background cues as both a potential aid and a pitfall in modern vision systems, offering a concrete framework to measure and improve robustness to contextual signals.
Abstract
We assess the tendency of state-of-the-art object recognition models to depend on signals from image backgrounds. We create a toolkit for disentangling foreground and background signal on ImageNet images, and find that (a) models can achieve non-trivial accuracy by relying on the background alone, (b) models often misclassify images even in the presence of correctly classified foregrounds--up to 87.5% of the time with adversarially chosen backgrounds, and (c) more accurate models tend to depend on backgrounds less. Our analysis of backgrounds brings us closer to understanding which correlations machine learning models use, and how they determine models' out of distribution performance.
