Table of Contents
Fetching ...

A Semantically Disentangled Unified Model for Multi-category 3D Anomaly Detection

SuYeon Kim, Wongyu Lee, MyeongAh Cho

Abstract

3D anomaly detection targets the detection and localization of defects in 3D point clouds trained solely on normal data. While a unified model improves scalability by learning across multiple categories, it often suffers from Inter-Category Entanglement (ICE)-where latent features from different categories overlap, causing the model to adopt incorrect semantic priors during reconstruction and ultimately yielding unreliable anomaly scores. To address this issue, we propose the Semantically Disentangled Unified Model for 3D Anomaly Detection, which reconstructs features conditioned on disentangled semantic representations. Our framework consists of three key components: (i) Coarse-to-Fine Global Tokenization for forming instance-level semantic identity, (ii) Category-Conditioned Contrastive Learning for disentangling category semantics, and (iii) a Geometry-Guided Decoder for semantically consistent reconstruction. Extensive experiments on Real3D-AD and Anomaly-ShapeNet demonstrate that our method achieves state-of-the-art for both unified and category-specific models, improving object-level AUROC by 2.8% and 9.1%, respectively, while enhancing the reliability of unified 3D anomaly detection.

A Semantically Disentangled Unified Model for Multi-category 3D Anomaly Detection

Abstract

3D anomaly detection targets the detection and localization of defects in 3D point clouds trained solely on normal data. While a unified model improves scalability by learning across multiple categories, it often suffers from Inter-Category Entanglement (ICE)-where latent features from different categories overlap, causing the model to adopt incorrect semantic priors during reconstruction and ultimately yielding unreliable anomaly scores. To address this issue, we propose the Semantically Disentangled Unified Model for 3D Anomaly Detection, which reconstructs features conditioned on disentangled semantic representations. Our framework consists of three key components: (i) Coarse-to-Fine Global Tokenization for forming instance-level semantic identity, (ii) Category-Conditioned Contrastive Learning for disentangling category semantics, and (iii) a Geometry-Guided Decoder for semantically consistent reconstruction. Extensive experiments on Real3D-AD and Anomaly-ShapeNet demonstrate that our method achieves state-of-the-art for both unified and category-specific models, improving object-level AUROC by 2.8% and 9.1%, respectively, while enhancing the reliability of unified 3D anomaly detection.

Paper Structure

This paper contains 21 sections, 19 equations, 7 figures, 18 tables.

Figures (7)

  • Figure 1: Overview of different paradigms for 3D anomaly detection. (a) Category-specific models trained on each object class separately. (b) A unified model handles multiple categories with local features but often suffers from inter-category feature entanglement. (c) Our method achieves semantically consistent results by aggregating coarse-to-fine geometric cues into category-aware global features, disentangling them via C3L, and guiding reconstruction with input geometry.
  • Figure 2: T-SNE visualization and quantitative analysis of semantic disentanglement. MC3D-AD exhibits entangled feature clusters across similar categories (e.g., chicken, duck, gemstone), where samples with low category classification scores are reconstructed under uncertain semantic priors, yielding high reconstruction errors on normal data and increased false positives. Our model instead forms well-separated, semantically aligned manifolds with higher classification scores and lower reconstruction errors, enabling reliable category-conditioned reconstruction.
  • Figure 3: The overview of the proposed method. Our method consists of two main stages: Semantically Disentangled Representation Learning and Semantically Disentangled Reconstruction. Given an input point cloud, CFGT encodes multi-resolution geometric features into a category-aware global token, C3L disentangles the latent semantics, and GGD reconstructs the object conditioned on these disentangled semantics and geometric priors.
  • Figure 4: Overview of the Geometry-Guided Decoder (GGD). Geometric priors $\mathbf{B}_{\mathrm{geo}}$ guide the attention mechanism to refine semantic features, and the results are aggregated through a feed-forward network.
  • Figure 5: Qualitative comparison with MC3D-AD across various object categories in Real3D-AD dataset. Red colored regions indicate abnormalities (e.g., bulge, sink). MC3D-AD often misses true anomalies or yields false positives, whereas our method delivers more accurate and complete localization aligned with the ground truth.
  • ...and 2 more figures