Self-supervised Feature Adaptation for 3D Industrial Anomaly Detection

Yuanpeng Tu; Boshen Zhang; Liang Liu; Yuxi Li; Xuhai Chen; Jiangning Zhang; Yabiao Wang; Chengjie Wang; Cai Rong Zhao

Self-supervised Feature Adaptation for 3D Industrial Anomaly Detection

Yuanpeng Tu, Boshen Zhang, Liang Liu, Yuxi Li, Xuhai Chen, Jiangning Zhang, Yabiao Wang, Chengjie Wang, Cai Rong Zhao

TL;DR

The paper tackles multi-modal 3D industrial anomaly detection by addressing domain gaps in transfered pretrained features. It introduces Local-to-global Self-supervised Feature Adaptation (LSFA), which jointly optimizes intra-modal feature compactness (IFC) and cross-modal local-to-global consistency (CLC) to produce task-oriented representations for RGB and 3D data. Using dynamic memory banks and multi-granularity signals, LSFA significantly improves anomaly localization and detection, achieving a new state-of-the-art on benchmarks like MVTec-3D AD with an I-AUROC of $97.1\%$ and strong results on Eyecandies. The method maintains efficiency by avoiding heavy fine-tuning and demonstrates robustness in few-shot regimes, offering practical benefits for industrial inspection systems.

Abstract

Industrial anomaly detection is generally addressed as an unsupervised task that aims at locating defects with only normal training samples. Recently, numerous 2D anomaly detection methods have been proposed and have achieved promising results, however, using only the 2D RGB data as input is not sufficient to identify imperceptible geometric surface anomalies. Hence, in this work, we focus on multi-modal anomaly detection. Specifically, we investigate early multi-modal approaches that attempted to utilize models pre-trained on large-scale visual datasets, i.e., ImageNet, to construct feature databases. And we empirically find that directly using these pre-trained models is not optimal, it can either fail to detect subtle defects or mistake abnormal features as normal ones. This may be attributed to the domain gap between target industrial data and source data.Towards this problem, we propose a Local-to-global Self-supervised Feature Adaptation (LSFA) method to finetune the adaptors and learn task-oriented representation toward anomaly detection.Both intra-modal adaptation and cross-modal alignment are optimized from a local-to-global perspective in LSFA to ensure the representation quality and consistency in the inference stage.Extensive experiments demonstrate that our method not only brings a significant performance boost to feature embedding based approaches, but also outperforms previous State-of-The-Art (SoTA) methods prominently on both MVTec-3D AD and Eyecandies datasets, e.g., LSFA achieves 97.1% I-AUROC on MVTec-3D, surpass previous SoTA by +3.4%.

Self-supervised Feature Adaptation for 3D Industrial Anomaly Detection

TL;DR

and strong results on Eyecandies. The method maintains efficiency by avoiding heavy fine-tuning and demonstrates robustness in few-shot regimes, offering practical benefits for industrial inspection systems.

Abstract

Paper Structure (14 sections, 9 equations, 6 figures, 9 tables, 1 algorithm)

This paper contains 14 sections, 9 equations, 6 figures, 9 tables, 1 algorithm.

Introduction
Related Work
Methodology
Overview
CLC: Cross-modal Local-to-global Consistency Alignment
IFC: Intra-modal Feature Compactness Optimization
Defect Localization
Experiments
Experimental Details
Comparison on 3D AD Benchmark
Ablation Study
Few-shot Anomaly Detection
Comparison with Fine-tuning Methods
Conclusion

Figures (6)

Figure 1: Illustrations of MVTec-3D AD datasetmvtec3dad. The second and third rows are the input point cloud data and RGB data. The fourth and fifth rows are prediction results. Our method can avoid the overestimation issues (as shown left) and produce more accurate results for categories with complex textures (as shown right).
Figure 2: The overall pipeline of our method. The features of two modalities are adapted from two views: Intra-modal Feature Compactness optimization (IFC) and Cross-modal Local-to-global Consistency alignment (CLC). The fine-tuned results of the adaptors are utilized for final defect localization.
Figure 3: The proposed inter-modal local-to-global consistency alignment. For the local view, similarity of path-wise features in the same/different location of the RGB image and its corresponding 3D point cloud is maximized/minimized to guarantee local-geometry consistency of two modalities. For global view, instance-wise features clustered from patch-wise features are optimized in a similar way.
Figure 4: The proposed local-to-global compactness optimization strategy, where both prototype-wise global-level and patch-wise local-level memory banks are involved.
Figure 5: Qualitative results of RGB/D modality.
...and 1 more figures

Self-supervised Feature Adaptation for 3D Industrial Anomaly Detection

TL;DR

Abstract

Self-supervised Feature Adaptation for 3D Industrial Anomaly Detection

Authors

TL;DR

Abstract

Table of Contents

Figures (6)