Table of Contents
Fetching ...

Towards Visual Discrimination and Reasoning of Real-World Physical Dynamics: Physics-Grounded Anomaly Detection

Wenqiao Li, Yao Gu, Xintao Chen, Xiaohao Xu, Ming Hu, Xiaonan Huang, Yingna Wu

TL;DR

This work tackles the challenge of industrial anomaly detection when physical priors and real-world dynamics are essential. It introduces Phys-AD, the first large-scale, physics-grounded video dataset for industrial anomalies, featuring 6,434 videos across 22 object categories, 47 anomaly types, and interactions with robotic arms and motors, with short clips lasting 60–240 frames. The authors formalize a two-step framework: rules deduction from video and physical priors to build a normal-rule bank, followed by anomaly reasoning on test videos to yield an anomaly score, and they introduce PAEval to evaluate not only detection but also descriptions and explanations of physical causes via visual-language models. Benchmark results show current unsupervised, weakly supervised, and video-understanding methods struggle on Phys-AD, particularly for physics-driven reasoning, underscoring the need for approaches that integrate temporal dynamics and physical priors; the Phys-AD benchmark and PAEval provide a new standard for physics-aware anomaly detection in real-world industrial contexts.

Abstract

Humans detect real-world object anomalies by perceiving, interacting, and reasoning based on object-conditioned physical knowledge. The long-term goal of Industrial Anomaly Detection (IAD) is to enable machines to autonomously replicate this skill. However, current IAD algorithms are largely developed and tested on static, semantically simple datasets, which diverge from real-world scenarios where physical understanding and reasoning are essential. To bridge this gap, we introduce the Physics Anomaly Detection (Phys-AD) dataset, the first large-scale, real-world, physics-grounded video dataset for industrial anomaly detection. Collected using a real robot arm and motor, Phys-AD provides a diverse set of dynamic, semantically rich scenarios. The dataset includes more than 6400 videos across 22 real-world object categories, interacting with robot arms and motors, and exhibits 47 types of anomalies. Anomaly detection in Phys-AD requires visual reasoning, combining both physical knowledge and video content to determine object abnormality. We benchmark state-of-the-art anomaly detection methods under three settings: unsupervised AD, weakly-supervised AD, and video-understanding AD, highlighting their limitations in handling physics-grounded anomalies. Additionally, we introduce the Physics Anomaly Explanation (PAEval) metric, designed to assess the ability of visual-language foundation models to not only detect anomalies but also provide accurate explanations for their underlying physical causes. Our project is available at https://guyao2023.github.io/Phys-AD/.

Towards Visual Discrimination and Reasoning of Real-World Physical Dynamics: Physics-Grounded Anomaly Detection

TL;DR

This work tackles the challenge of industrial anomaly detection when physical priors and real-world dynamics are essential. It introduces Phys-AD, the first large-scale, physics-grounded video dataset for industrial anomalies, featuring 6,434 videos across 22 object categories, 47 anomaly types, and interactions with robotic arms and motors, with short clips lasting 60–240 frames. The authors formalize a two-step framework: rules deduction from video and physical priors to build a normal-rule bank, followed by anomaly reasoning on test videos to yield an anomaly score, and they introduce PAEval to evaluate not only detection but also descriptions and explanations of physical causes via visual-language models. Benchmark results show current unsupervised, weakly supervised, and video-understanding methods struggle on Phys-AD, particularly for physics-driven reasoning, underscoring the need for approaches that integrate temporal dynamics and physical priors; the Phys-AD benchmark and PAEval provide a new standard for physics-aware anomaly detection in real-world industrial contexts.

Abstract

Humans detect real-world object anomalies by perceiving, interacting, and reasoning based on object-conditioned physical knowledge. The long-term goal of Industrial Anomaly Detection (IAD) is to enable machines to autonomously replicate this skill. However, current IAD algorithms are largely developed and tested on static, semantically simple datasets, which diverge from real-world scenarios where physical understanding and reasoning are essential. To bridge this gap, we introduce the Physics Anomaly Detection (Phys-AD) dataset, the first large-scale, real-world, physics-grounded video dataset for industrial anomaly detection. Collected using a real robot arm and motor, Phys-AD provides a diverse set of dynamic, semantically rich scenarios. The dataset includes more than 6400 videos across 22 real-world object categories, interacting with robot arms and motors, and exhibits 47 types of anomalies. Anomaly detection in Phys-AD requires visual reasoning, combining both physical knowledge and video content to determine object abnormality. We benchmark state-of-the-art anomaly detection methods under three settings: unsupervised AD, weakly-supervised AD, and video-understanding AD, highlighting their limitations in handling physics-grounded anomalies. Additionally, we introduce the Physics Anomaly Explanation (PAEval) metric, designed to assess the ability of visual-language foundation models to not only detect anomalies but also provide accurate explanations for their underlying physical causes. Our project is available at https://guyao2023.github.io/Phys-AD/.

Paper Structure

This paper contains 49 sections, 5 equations, 13 figures, 7 tables.

Figures (13)

  • Figure 1: Towards visual discrimination of physical dynamics in real-world industrial object anomaly detection. We illustrate objects, interactions, and time-sequenced videos from the Physics-Grounded Anomaly Detection dataset: (a) Object; (b) Interaction: Applied actions shown with directional arrows; (c) Video with Physical Dynamics: Temporal sequences showing normal and abnormal states, highlighting anomalies like leaks, misalignments, and cracks. By focusing on the dynamic behaviors of complex objects, we enhance understanding of interactions and failure modes in real-world settings, where both structure and motion contribute to anomaly detection.
  • Figure 2: Human-like decision-making process for physics-grounded object anomaly detection. We illustrate the sequential approach of a human-like agent for evaluating an object’s normality. First, the agent perceives relevant physical attributes (e.g., plastic and elastic), then interacts by performing a physical action (e.g., squeezing), and finally reasons based on the vision feedback and attributes changes (e.g., surface shape change) to determine whether the object is normal or anomalous. This mirrors a human’s natural process of reasoning over physics in objects.
  • Figure 3: Interactions for understanding implicit physical laws in the Phys-AD dataset. We showcase various object interactions from the Phys-AD dataset, where different actions (indicated by motion directions) are used to explore and reason about the underlying physical properties and behaviors of each object. The colored arrows indicate the interaction directions and axes, highlighting how physical interactions reveal the implicit physics governing each object.
  • Figure 4: Categorization of anomalies based on persistence in the Phys-AD dataset. We show examples of normal and abnormal functioning in common objects, divided into two anomaly types: persistent and intermittent. (a) Persistent anomalies, such as continuous obstruction in the U Disk or permanent malfunction of the Sticky Roller, are visible throughout the operation. (b) In contrast, intermittent anomalies, like occasional jamming of the U Disk or breakage in the Sticky Roller after initial operation, only appear at specific points in time. This classification provides insight into both constant and sporadic failures in object interactions.
  • Figure 5: Data collection pipeline for the Phys-AD dataset. (a) Manipulation of a toothpaste tube using a UR5 robotic arm. (b) Manipulation of a U Disk and fan via servo and motor.
  • ...and 8 more figures