Table of Contents
Fetching ...

Hierarchical Point-Patch Fusion with Adaptive Patch Codebook for 3D Shape Anomaly Detection

Xueyang Kang, Zizhao Li, Tian Lan, Dong Gong, Kourosh Khoshelham, Liangliang Nan

Abstract

3D shape anomaly detection is a crucial task for industrial inspection and geometric analysis. Existing deep learning approaches typically learn representations of normal shapes and identify anomalies via out-of-distribution feature detection or decoder-based reconstruction. They often fail to generalize across diverse anomaly types and scales, such as global geometric errors (e.g., planar shifts, angle misalignments), and are sensitive to noisy or incomplete local points during training. To address these limitations, we propose a hierarchical point-patch anomaly scoring network that jointly models regional part features and local point features for robust anomaly reasoning. An adaptive patchification module integrates self-supervised decomposition to capture complex structural deviations. Beyond evaluations on public benchmarks (Anomaly-ShapeNet and Real3D-AD), we release an industrial test set with real CAD models exhibiting planar, angular, and structural defects. Experiments on public and industrial datasets show superior AUC-ROC and AUC-PR performance, including over 40% point-level improvement on the new industrial anomaly type and average object-level gains of 7% on Real3D-AD and 4% on Anomaly-ShapeNet, demonstrating strong robustness and generalization.

Hierarchical Point-Patch Fusion with Adaptive Patch Codebook for 3D Shape Anomaly Detection

Abstract

3D shape anomaly detection is a crucial task for industrial inspection and geometric analysis. Existing deep learning approaches typically learn representations of normal shapes and identify anomalies via out-of-distribution feature detection or decoder-based reconstruction. They often fail to generalize across diverse anomaly types and scales, such as global geometric errors (e.g., planar shifts, angle misalignments), and are sensitive to noisy or incomplete local points during training. To address these limitations, we propose a hierarchical point-patch anomaly scoring network that jointly models regional part features and local point features for robust anomaly reasoning. An adaptive patchification module integrates self-supervised decomposition to capture complex structural deviations. Beyond evaluations on public benchmarks (Anomaly-ShapeNet and Real3D-AD), we release an industrial test set with real CAD models exhibiting planar, angular, and structural defects. Experiments on public and industrial datasets show superior AUC-ROC and AUC-PR performance, including over 40% point-level improvement on the new industrial anomaly type and average object-level gains of 7% on Real3D-AD and 4% on Anomaly-ShapeNet, demonstrating strong robustness and generalization.

Paper Structure

This paper contains 24 sections, 10 equations, 5 figures, 6 tables, 1 algorithm.

Figures (5)

  • Figure 1: Patch feature fusion for 3D shape anomaly detection in point clouds. Normal patches $\{\mathbf{S}_j\}, j \in \{1,\ldots,6\}$ and abnormal patches $\{\tilde{\mathbf{S}}_j\}$ are ordered by distance to the object center. The left heatmap visualizes the cosine similarity between patch features, where higher values indicate stronger similarity between normal and abnormal patches. Anomalous regions exhibit distinctive feature discrepancies in the last row, indicating ambiguous patch correspondences. These patch-level differences are fused to guide pointwise anomaly detection, with red regions in the right point cloud marking detected anomalies.
  • Figure 2: Overview of the proposed shape anomaly detection framework. The pipeline processes a normal shape $\mathbf{S}$ through several modules: Patchification segments the input into patches $\{\mathbf{S}_j\}$, while Negative Augmentation generates pseudo anomalous shapes $\tilde{\mathbf{S}}$ that are also patchified during training (red regions indicate anomalies); (a) Adaptive Patch Feature Extraction computes features for each patch by averaging its points and querying a pre-trained UNet to obtain patch features $\{\mathbf{p}_j\}$; (b) Template Codebook stores a dictionary of normal template patch features $\{\mathbf{t}_k\}$, from which each patch retrieves its closest matching patch; (c) Point to Patch Cross-attention fuses the template features $\{\mathbf{t}_k\}$ with point-level features extracted from the pre-trained UNet encoder through multi-head attention mechanism; (d) Patch Score Modulation modulates the cross-attention outputs using similarity scores between normal and abnormal patch features $\{\mathbf{t}_k\}, \{\mathbf{p}_j\}$, with the modulated features fed into MLP regression layers to predict the anomaly score $\delta_{\text{pred}}$. During training, (e) the loss is computed as $\hat{\mathbf{o}} - \mathbf{o}_{gt}$, where $\mathbf{o}_{gt}$ represents the ground-truth offset direction derived from the difference $\mathbf{S} - \tilde{\mathbf{S}}$. During inference (indicated by green arrows), only the test shape is input for anomaly detection.
  • Figure 3: Qualitative comparison on three datasets (green letters): our crafted industrial components (1st–2nd rows), Real3D-AD (3rd), and Anomaly-ShapeNet (4th). The dashed line indicates different object classes. Representative anomaly types (in blue) include displacement, bending, concavity, sinks, and bulges. Anomaly points are highlighted in red. Ground truth uses red masks (first two rows) or red contours (last two rows) to indicate anomaly regions overlaid with an anomalous mesh. Each column shows results from a specific method.
  • Figure 4: AUC-ROC performance under varying configuration parameters. Three parameter groups are compared: voxel size, patch number, and patch size. Each parameter is evaluated at object-level (solid lines) and point-level (dashed lines).
  • Figure 5: t-SNE visualization of point feature distributions. '+' denotes patch centers; colors indicate 32 patch membership. (a) Without fusion, features scatter with poor separation, lacking discriminative structure despite close patch centers. (b) With patch feature modulation and fusion, clusters are well-separated.