PointAD: Comprehending 3D Anomalies from Points and Pixels for Zero-shot 3D Anomaly Detection

Qihang Zhou; Jiangtao Yan; Shibo He; Wenchao Meng; Jiming Chen

PointAD: Comprehending 3D Anomalies from Points and Pixels for Zero-shot 3D Anomaly Detection

Qihang Zhou, Jiangtao Yan, Shibo He, Wenchao Meng, Jiming Chen

TL;DR

PointAD introduces a CLIP-based framework for zero-shot 3D anomaly detection by unifying point- and pixel-level reasoning through multi-view renderings. It leverages MIL to aggregate 3D glocal semantics and MT L to constrain 2D anomaly semantics, with a hybrid prompt learning strategy that encodes generic normality and abnormality. The method supports plug-and-play RGB integration for zero-shot multimodal 3D detection and demonstrates superior performance across MVTec3D-AD, Eyecandies, and Real3D-AD, including cross-dataset generalization. The approach offers a scalable, non-memorization-based pathway for detecting unseen 3D anomalies in diverse objects, with practical implications for privacy-preserving industrial inspection and beyond.

Abstract

Zero-shot (ZS) 3D anomaly detection is a crucial yet unexplored field that addresses scenarios where target 3D training samples are unavailable due to practical concerns like privacy protection. This paper introduces PointAD, a novel approach that transfers the strong generalization capabilities of CLIP for recognizing 3D anomalies on unseen objects. PointAD provides a unified framework to comprehend 3D anomalies from both points and pixels. In this framework, PointAD renders 3D anomalies into multiple 2D renderings and projects them back into 3D space. To capture the generic anomaly semantics into PointAD, we propose hybrid representation learning that optimizes the learnable text prompts from 3D and 2D through auxiliary point clouds. The collaboration optimization between point and pixel representations jointly facilitates our model to grasp underlying 3D anomaly patterns, contributing to detecting and segmenting anomalies of unseen diverse 3D objects. Through the alignment of 3D and 2D space, our model can directly integrate RGB information, further enhancing the understanding of 3D anomalies in a plug-and-play manner. Extensive experiments show the superiority of PointAD in ZS 3D anomaly detection across diverse unseen objects.

PointAD: Comprehending 3D Anomalies from Points and Pixels for Zero-shot 3D Anomaly Detection

TL;DR

Abstract

Paper Structure (54 sections, 9 equations, 19 figures, 25 tables)

This paper contains 54 sections, 9 equations, 19 figures, 25 tables.

Introduction
Related Work
3D Anomaly Detection
3D Feature Extraction
Prompt Learning
PointAD
A Review of CLIP
Overview of PointAD
Multi-View Rendering
Representations for 3D and 2D information
Hybrid representation learning
MIL-based 3D representation learning
MTL-based 2D representation learning
Training and Inference
ZS 3D/M3D inference
...and 39 more sections

Figures (19)

Figure 1: Motivation of zero-shot 3D anomaly detection. (a): Top: The hole on the cookies presents a similar appearance to the background. Bottom: Surface damage on the potato is unapparent to the object foreground. In these cases, leveraging RGB information makes it difficult to detect anomalies that imitate the color patterns of the background or foreground. However, effective recognition can be achieved by modeling the point relations within corresponding point clouds. (b) and (c) depicts the setting difference of ZS and unsupervised manner.
Figure 1: Performance comparison on ZS 3D anomaly detection in "one-vs-rest" setting.
Figure 2: Framework of PointAD. To transfer the strong generalization of CLIP from 2D to 3D, point clouds and corresponding ground truths are respectively rendered into 2D renderings from multi-view. Then, vision encoder of CLIP extracts the renderings to derive 2D global and local representations. These representations are transformed into glocal 3D point representations to learn 3D anomaly semantics within point clouds. Finally, we align the normality and abnormality from both point perspectives (multiple instance learning) and pixel perspectives (multiple task learning) and propose a hybrid loss to jointly optimize the text embeddings from the learnable normality and abnormality text prompts, capturing the underlying generic anomaly patterns.
Figure 3: Visualization on anomaly score maps in ZS 3D anomaly detection. Point clouds of diverse objects are input into PointAD to generate 2D and 3D representations. Each row visualizes the anomaly score maps of 2D renderings from different views, and the final point score maps are also presented. More visualizations are provided in Appendix \ref{['appendix: addition_visualization']}.
Figure 3: Performance comparison on ZS 3D anomaly detection in cross-dataset setting.
...and 14 more figures

PointAD: Comprehending 3D Anomalies from Points and Pixels for Zero-shot 3D Anomaly Detection

TL;DR

Abstract

PointAD: Comprehending 3D Anomalies from Points and Pixels for Zero-shot 3D Anomaly Detection

Authors

TL;DR

Abstract

Table of Contents

Figures (19)