XIMAGENET-12: An Explainable AI Benchmark Dataset for Model Robustness Evaluation

Qiang Li; Dan Zhang; Shengzhao Lei; Xun Zhao; Porawit Kamnoedboon; WeiWei Li; Junhao Dong; Shuyan Li

XIMAGENET-12: An Explainable AI Benchmark Dataset for Model Robustness Evaluation

Qiang Li, Dan Zhang, Shengzhao Lei, Xun Zhao, Porawit Kamnoedboon, WeiWei Li, Junhao Dong, Shuyan Li

TL;DR

XIMAGENET-12 introduces an explainable robustness benchmark for vision models by combining precise foreground/background annotations with six realistic perturbations across 12 ImageNet categories. It defines a variance-based robustness score that jointly accounts for cross-scenario and within-scenario performance, and demonstrates how background manipulation affects state-of-the-art backbones and segmentation models. The results reveal that background changes, especially random substitutions, can drastically reduce accuracy, while models trained on segmented foregrounds show resilience to missing backgrounds; importantly, higher nominal accuracy does not always correlate with robustness. The dataset and accompanying code enable detailed cross-scenario robustness evaluation, with practical implications for deployment in real-world settings and for guiding model selection in industrial applications and domain adaptation tasks.

Abstract

Despite the promising performance of existing visual models on public benchmarks, the critical assessment of their robustness for real-world applications remains an ongoing challenge. To bridge this gap, we propose an explainable visual dataset, XIMAGENET-12, to evaluate the robustness of visual models. XIMAGENET-12 consists of over 200K images with 15,410 manual semantic annotations. Specifically, we deliberately selected 12 categories from ImageNet, representing objects commonly encountered in practical life. To simulate real-world situations, we incorporated six diverse scenarios, such as overexposure, blurring, and color changes, etc. We further develop a quantitative criterion for robustness assessment, allowing for a nuanced understanding of how visual models perform under varying conditions, notably in relation to the background. We make the XIMAGENET-12 dataset and its corresponding code openly accessible at \url{https://sites.google.com/view/ximagenet-12/home}. We expect the introduction of the XIMAGENET-12 dataset will empower researchers to thoroughly evaluate the robustness of their visual models under challenging conditions.

XIMAGENET-12: An Explainable AI Benchmark Dataset for Model Robustness Evaluation

TL;DR

Abstract

XIMAGENET-12: An Explainable AI Benchmark Dataset for Model Robustness Evaluation

Authors

TL;DR

Abstract

Table of Contents

Figures (16)