Towards Visual Discrimination and Reasoning of Real-World Physical Dynamics: Physics-Grounded Anomaly Detection
Wenqiao Li, Yao Gu, Xintao Chen, Xiaohao Xu, Ming Hu, Xiaonan Huang, Yingna Wu
TL;DR
This work tackles the challenge of industrial anomaly detection when physical priors and real-world dynamics are essential. It introduces Phys-AD, the first large-scale, physics-grounded video dataset for industrial anomalies, featuring 6,434 videos across 22 object categories, 47 anomaly types, and interactions with robotic arms and motors, with short clips lasting 60–240 frames. The authors formalize a two-step framework: rules deduction from video and physical priors to build a normal-rule bank, followed by anomaly reasoning on test videos to yield an anomaly score, and they introduce PAEval to evaluate not only detection but also descriptions and explanations of physical causes via visual-language models. Benchmark results show current unsupervised, weakly supervised, and video-understanding methods struggle on Phys-AD, particularly for physics-driven reasoning, underscoring the need for approaches that integrate temporal dynamics and physical priors; the Phys-AD benchmark and PAEval provide a new standard for physics-aware anomaly detection in real-world industrial contexts.
Abstract
Humans detect real-world object anomalies by perceiving, interacting, and reasoning based on object-conditioned physical knowledge. The long-term goal of Industrial Anomaly Detection (IAD) is to enable machines to autonomously replicate this skill. However, current IAD algorithms are largely developed and tested on static, semantically simple datasets, which diverge from real-world scenarios where physical understanding and reasoning are essential. To bridge this gap, we introduce the Physics Anomaly Detection (Phys-AD) dataset, the first large-scale, real-world, physics-grounded video dataset for industrial anomaly detection. Collected using a real robot arm and motor, Phys-AD provides a diverse set of dynamic, semantically rich scenarios. The dataset includes more than 6400 videos across 22 real-world object categories, interacting with robot arms and motors, and exhibits 47 types of anomalies. Anomaly detection in Phys-AD requires visual reasoning, combining both physical knowledge and video content to determine object abnormality. We benchmark state-of-the-art anomaly detection methods under three settings: unsupervised AD, weakly-supervised AD, and video-understanding AD, highlighting their limitations in handling physics-grounded anomalies. Additionally, we introduce the Physics Anomaly Explanation (PAEval) metric, designed to assess the ability of visual-language foundation models to not only detect anomalies but also provide accurate explanations for their underlying physical causes. Our project is available at https://guyao2023.github.io/Phys-AD/.
