Table of Contents
Fetching ...

Towards Zero-shot 3D Anomaly Localization

Yizhou Wang, Kuan-Chuan Peng, Yun Fu

TL;DR

This work tackles zero-shot 3D anomaly localization, where the target class lacks normal training data. It introduces 3DzAL, a patch-level contrastive framework that uses pseudo anomalies generated from task-irrelevant 3D data, a memory-bank with RGB, FPFH, and learnable 3D features, and a 3D normalcy classifier augmented by adversarial perturbations to produce robust anomaly scores. A key insight is that a randomly initialized CNN exhibits an inductive bias that localizes regions of interest in 3D XYZ data, enabling effective pseudo anomaly synthesis without pre-trained 3D models. Across 90 zero-shot trials on the MVTec 3D-AD dataset, 3DzAL achieves state-of-the-art pixel-level and image-level metrics, demonstrating strong generalization to unseen classes and validating the utility of task-irrelevant data and input perturbations for zero-shot 3D anomaly detection and localization.

Abstract

3D anomaly detection and localization is of great significance for industrial inspection. Prior 3D anomaly detection and localization methods focus on the setting that the testing data share the same category as the training data which is normal. However, in real-world applications, the normal training data for the target 3D objects can be unavailable due to issues like data privacy or export control regulation. To tackle these challenges, we identify a new task -- zero-shot 3D anomaly detection and localization, where the training and testing classes do not overlap. To this end, we design 3DzAL, a novel patch-level contrastive learning framework based on pseudo anomalies generated using the inductive bias from task-irrelevant 3D xyz data to learn more representative feature representations. Furthermore, we train a normalcy classifier network to classify the normal patches and pseudo anomalies and utilize the classification result jointly with feature distance to design anomaly scores. Instead of directly using the patch point clouds, we introduce adversarial perturbations to the input patch xyz data before feeding into the 3D normalcy classifier for the classification-based anomaly score. We show that 3DzAL outperforms the state-of-the-art anomaly detection and localization performance.

Towards Zero-shot 3D Anomaly Localization

TL;DR

This work tackles zero-shot 3D anomaly localization, where the target class lacks normal training data. It introduces 3DzAL, a patch-level contrastive framework that uses pseudo anomalies generated from task-irrelevant 3D data, a memory-bank with RGB, FPFH, and learnable 3D features, and a 3D normalcy classifier augmented by adversarial perturbations to produce robust anomaly scores. A key insight is that a randomly initialized CNN exhibits an inductive bias that localizes regions of interest in 3D XYZ data, enabling effective pseudo anomaly synthesis without pre-trained 3D models. Across 90 zero-shot trials on the MVTec 3D-AD dataset, 3DzAL achieves state-of-the-art pixel-level and image-level metrics, demonstrating strong generalization to unseen classes and validating the utility of task-irrelevant data and input perturbations for zero-shot 3D anomaly detection and localization.

Abstract

3D anomaly detection and localization is of great significance for industrial inspection. Prior 3D anomaly detection and localization methods focus on the setting that the testing data share the same category as the training data which is normal. However, in real-world applications, the normal training data for the target 3D objects can be unavailable due to issues like data privacy or export control regulation. To tackle these challenges, we identify a new task -- zero-shot 3D anomaly detection and localization, where the training and testing classes do not overlap. To this end, we design 3DzAL, a novel patch-level contrastive learning framework based on pseudo anomalies generated using the inductive bias from task-irrelevant 3D xyz data to learn more representative feature representations. Furthermore, we train a normalcy classifier network to classify the normal patches and pseudo anomalies and utilize the classification result jointly with feature distance to design anomaly scores. Instead of directly using the patch point clouds, we introduce adversarial perturbations to the input patch xyz data before feeding into the 3D normalcy classifier for the classification-based anomaly score. We show that 3DzAL outperforms the state-of-the-art anomaly detection and localization performance.

Paper Structure

This paper contains 9 sections, 11 equations, 4 figures, 6 tables.

Figures (4)

  • Figure 1: Problem overview. Current 3D anomaly detection and localization works entail training on the normal data of one class and testing on the normal and abnormal data of the same class. We extend such setting by testing on other classes without the corresponding normal training data. This zero-shot setting is practical when such data are unavailable (e.g., due to data privacy, export control laws, etc.). GT denotes ground truth.
  • Figure 2: Framework overview. Our proposed 3DzAL framework mainly adopts three branches to extract features given both 2D and 3D data of an object. The RGB branch extracts feature from 2D image data of the object using ResNet pre-trained on ImageNet. The FPFH branch extracts handcrafted FPFH features from 3D point cloud data. The point cloud branch employs a learnable network (PointNet++) to extract features. The network is trained by a patch-level contrastive learning loss, which takes inductive bias-based pseudo anomaly patches as negative samples and normal patches as positive samples and a representation disentanglement loss which pushes the FPFH features and the learned 3D features away. The features of the three branches are concatenated to store in the memory bank where a coreset selection is performed. In addition, a normalcy classifier is trained to classify the pseudo anomaly patch and the normal patch using the binary cross-entropy loss.
  • Figure 3: Inductive bias of random networks. We feed the xyz data of abnormal examples as the input of a randomly initialized and untrained ResNet-50, and visualize the attention maps. These maps show that the random network has the inductive bias of covering the locations of interest, including the locations shown in the ground truth.
  • Figure 4: Pseudo anomaly generation. Overview of our proposed patch-level 3D pseudo anomaly sample generation process for both "adding" and "removing" type anomalies.