Table of Contents
Fetching ...

RAD: A Dataset and Benchmark for Real-Life Anomaly Detection with Robotic Observations

Kaichen Zhou, Xinhai Chang, Taewhan Kim, Jiadong Zhang, Yang Cao, Chufei Peng, Fangneng Zhan, Hao Zhao, Hao Dong, Kai Ming Ting, Ye Zhu

TL;DR

RAD introduces a realistic robot-captured, multi-view anomaly detection benchmark designed to stress pose variation, reflective materials, and viewpoint-dependent visibility. It evaluates 2D feature-based, 3D reconstruction, and vision-language model pipelines under pose-agnostic conditions, finding that mature 2D features surpass 3D and VLMs at image level, while 3D methods and VLMs show limited gains at pixel level due to reconstruction artifacts, reflectance, and sparse viewpoints. The study highlights reflective materials, geometric symmetry, and sparse viewpoint coverage as fundamental challenges, arguing for methods that jointly reason over appearance and geometry with uncertainty. The RAD dataset and benchmark provide a challenging, publicly available testbed to drive progress in realistic robotic anomaly detection beyond controlled laboratory setups.

Abstract

Anomaly detection is a core capability for robotic perception and industrial inspection, yet most existing benchmarks are collected under controlled conditions with fixed viewpoints and stable illumination, failing to reflect real deployment scenarios. We introduce RAD (Realistic Anomaly Detection), a robot-captured, multi-view dataset designed to stress pose variation, reflective materials, and viewpoint-dependent defect visibility. RAD covers 13 everyday object categories and four realistic defect types--scratched, missing, stained, and squeezed--captured from over 60 robot viewpoints per object under uncontrolled lighting. We benchmark a wide range of state-of-the-art approaches, including 2D feature-based methods, 3D reconstruction pipelines, and vision-language models (VLMs), under a pose-agnostic setting. Surprisingly, we find that mature 2D feature-embedding methods consistently outperform recent 3D and VLM-based approaches at the image level, while the performance gap narrows for pixel-level localization. Our analysis reveals that reflective surfaces, geometric symmetry, and sparse viewpoint coverage fundamentally limit current geometry-based and zero-shot methods. RAD establishes a challenging and realistic benchmark for robotic anomaly detection, highlighting critical open problems beyond controlled laboratory settings.

RAD: A Dataset and Benchmark for Real-Life Anomaly Detection with Robotic Observations

TL;DR

RAD introduces a realistic robot-captured, multi-view anomaly detection benchmark designed to stress pose variation, reflective materials, and viewpoint-dependent visibility. It evaluates 2D feature-based, 3D reconstruction, and vision-language model pipelines under pose-agnostic conditions, finding that mature 2D features surpass 3D and VLMs at image level, while 3D methods and VLMs show limited gains at pixel level due to reconstruction artifacts, reflectance, and sparse viewpoints. The study highlights reflective materials, geometric symmetry, and sparse viewpoint coverage as fundamental challenges, arguing for methods that jointly reason over appearance and geometry with uncertainty. The RAD dataset and benchmark provide a challenging, publicly available testbed to drive progress in realistic robotic anomaly detection beyond controlled laboratory setups.

Abstract

Anomaly detection is a core capability for robotic perception and industrial inspection, yet most existing benchmarks are collected under controlled conditions with fixed viewpoints and stable illumination, failing to reflect real deployment scenarios. We introduce RAD (Realistic Anomaly Detection), a robot-captured, multi-view dataset designed to stress pose variation, reflective materials, and viewpoint-dependent defect visibility. RAD covers 13 everyday object categories and four realistic defect types--scratched, missing, stained, and squeezed--captured from over 60 robot viewpoints per object under uncontrolled lighting. We benchmark a wide range of state-of-the-art approaches, including 2D feature-based methods, 3D reconstruction pipelines, and vision-language models (VLMs), under a pose-agnostic setting. Surprisingly, we find that mature 2D feature-embedding methods consistently outperform recent 3D and VLM-based approaches at the image level, while the performance gap narrows for pixel-level localization. Our analysis reveals that reflective surfaces, geometric symmetry, and sparse viewpoint coverage fundamentally limit current geometry-based and zero-shot methods. RAD establishes a challenging and realistic benchmark for robotic anomaly detection, highlighting critical open problems beyond controlled laboratory settings.
Paper Structure (15 sections, 4 equations, 5 figures, 4 tables)

This paper contains 15 sections, 4 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Gallery of RAD. RAD contains 13 industrial object categories captured from 68 robot viewpoints under uncontrolled illumination, introducing variations in pose, reflectance, and geometric symmetry that pose significant challenges for existing anomaly detectors.
  • Figure 2: Overview of the RAD Anomaly Detection Benchmark Pipeline. The framework integrates robotic multi-view Data Collection, fine-grained Data Annotation, and three specialized models (2D, 3D, and customizable Text-VLM) to achieve comprehensive outputs including classification, pixel segmentation, and anomaly type identification.
  • Figure 3: Illustration of Annotation Procedure: This diagram illustrates the annotation process for identifying missing components. Annotations for missing parts rely on comparing defective objects with normal ones.
  • Figure 4: Dataset Metrics. (a) illustrates the pixel-wise ratio within each defect across various categories. (b) shows the total number ratio of each defect in RAD. "Mi." refers to Missing. "No." refers to Normal. "Sq." refers to Squeezed. "Sc." refers to scratched. "St." refers to "Stained". (c) shows the pixel-wise defect ratio across different categories.
  • Figure 5: Visualization of Pixel-Wise Anomaly Detection Baselines. The heatmap visualization illustrates the ground truth and inference results for two anomalous objects, Bowl and Spraybottle.