RAD: A Dataset and Benchmark for Real-Life Anomaly Detection with Robotic Observations
Kaichen Zhou, Xinhai Chang, Taewhan Kim, Jiadong Zhang, Yang Cao, Chufei Peng, Fangneng Zhan, Hao Zhao, Hao Dong, Kai Ming Ting, Ye Zhu
TL;DR
RAD introduces a realistic robot-captured, multi-view anomaly detection benchmark designed to stress pose variation, reflective materials, and viewpoint-dependent visibility. It evaluates 2D feature-based, 3D reconstruction, and vision-language model pipelines under pose-agnostic conditions, finding that mature 2D features surpass 3D and VLMs at image level, while 3D methods and VLMs show limited gains at pixel level due to reconstruction artifacts, reflectance, and sparse viewpoints. The study highlights reflective materials, geometric symmetry, and sparse viewpoint coverage as fundamental challenges, arguing for methods that jointly reason over appearance and geometry with uncertainty. The RAD dataset and benchmark provide a challenging, publicly available testbed to drive progress in realistic robotic anomaly detection beyond controlled laboratory setups.
Abstract
Anomaly detection is a core capability for robotic perception and industrial inspection, yet most existing benchmarks are collected under controlled conditions with fixed viewpoints and stable illumination, failing to reflect real deployment scenarios. We introduce RAD (Realistic Anomaly Detection), a robot-captured, multi-view dataset designed to stress pose variation, reflective materials, and viewpoint-dependent defect visibility. RAD covers 13 everyday object categories and four realistic defect types--scratched, missing, stained, and squeezed--captured from over 60 robot viewpoints per object under uncontrolled lighting. We benchmark a wide range of state-of-the-art approaches, including 2D feature-based methods, 3D reconstruction pipelines, and vision-language models (VLMs), under a pose-agnostic setting. Surprisingly, we find that mature 2D feature-embedding methods consistently outperform recent 3D and VLM-based approaches at the image level, while the performance gap narrows for pixel-level localization. Our analysis reveals that reflective surfaces, geometric symmetry, and sparse viewpoint coverage fundamentally limit current geometry-based and zero-shot methods. RAD establishes a challenging and realistic benchmark for robotic anomaly detection, highlighting critical open problems beyond controlled laboratory settings.
