Table of Contents
Fetching ...

SuperSimpleNet: Unifying Unsupervised and Supervised Learning for Fast and Reliable Surface Defect Detection

Blaž Rolih, Matic Fučka, Danijel Skočaj

TL;DR

Surface defect detection requires high accuracy, robustness, and fast operation while leveraging all available data. SuperSimpleNet unifies unsupervised and supervised learning by extending SimpleNet with a feature-space anomaly generator, an upscaling-capable feature extractor, a feature adaptor, and a segmentation-detection pipeline that includes a global classification head. It achieves state-of-the-art results on both supervised (SensumSODF, KSDD2) and unsupervised (MVTec AD, VisA) benchmarks, while maintaining a fast inference time of about 9.3 ms and 268 images per second. The approach demonstrates strong robustness and stability across training runs and data regimes, with ablation analyses clarifying the contribution of each architectural component and training strategy to performance and efficiency.

Abstract

The aim of surface defect detection is to identify and localise abnormal regions on the surfaces of captured objects, a task that's increasingly demanded across various industries. Current approaches frequently fail to fulfil the extensive demands of these industries, which encompass high performance, consistency, and fast operation, along with the capacity to leverage the entirety of the available training data. Addressing these gaps, we introduce SuperSimpleNet, an innovative discriminative model that evolved from SimpleNet. This advanced model significantly enhances its predecessor's training consistency, inference time, as well as detection performance. SuperSimpleNet operates in an unsupervised manner using only normal training images but also benefits from labelled abnormal training images when they are available. SuperSimpleNet achieves state-of-the-art results in both the supervised and the unsupervised settings, as demonstrated by experiments across four challenging benchmark datasets. Code: https://github.com/blaz-r/SuperSimpleNet .

SuperSimpleNet: Unifying Unsupervised and Supervised Learning for Fast and Reliable Surface Defect Detection

TL;DR

Surface defect detection requires high accuracy, robustness, and fast operation while leveraging all available data. SuperSimpleNet unifies unsupervised and supervised learning by extending SimpleNet with a feature-space anomaly generator, an upscaling-capable feature extractor, a feature adaptor, and a segmentation-detection pipeline that includes a global classification head. It achieves state-of-the-art results on both supervised (SensumSODF, KSDD2) and unsupervised (MVTec AD, VisA) benchmarks, while maintaining a fast inference time of about 9.3 ms and 268 images per second. The approach demonstrates strong robustness and stability across training runs and data regimes, with ablation analyses clarifying the contribution of each architectural component and training strategy to performance and efficiency.

Abstract

The aim of surface defect detection is to identify and localise abnormal regions on the surfaces of captured objects, a task that's increasingly demanded across various industries. Current approaches frequently fail to fulfil the extensive demands of these industries, which encompass high performance, consistency, and fast operation, along with the capacity to leverage the entirety of the available training data. Addressing these gaps, we introduce SuperSimpleNet, an innovative discriminative model that evolved from SimpleNet. This advanced model significantly enhances its predecessor's training consistency, inference time, as well as detection performance. SuperSimpleNet operates in an unsupervised manner using only normal training images but also benefits from labelled abnormal training images when they are available. SuperSimpleNet achieves state-of-the-art results in both the supervised and the unsupervised settings, as demonstrated by experiments across four challenging benchmark datasets. Code: https://github.com/blaz-r/SuperSimpleNet .
Paper Structure (20 sections, 4 equations, 7 figures, 6 tables)

This paper contains 20 sections, 4 equations, 7 figures, 6 tables.

Figures (7)

  • Figure 1: Model comparison for both the supervised (KSDD2 KSDD2 and SensumSODF racki_sensum) and the unsupervised (MVTec AD mvtec and VisA visa) setting. The Y-axis represents the anomaly detection performance measured in AUROC, and the X-axis represents inference time in milliseconds using an NVIDIA Tesla V100S (more details in Section \ref{['sec:results']}). The size of the circles represents the model's parameter size. Additionally, the table below indicates whether each model meets specific speed requirements (if its inference time is below 10ms) and whether it is capable of working in the unsupervised and/or the supervised setting. If a model is designed specifically for either the supervised or the unsupervised setting but theoretically applicable to the other, we marked the opposing cell with a '*'. Two methods (marked with '-') lack publicly available code, preventing us from assessing their speed. SuperSimpleNet stands out as the only model meeting all criteria.
  • Figure 2: SuperSimpleNet's architecture. Features are first extracted, upscaled, and adapted. During training, synthetic anomalies are generated in latent space by adding Gaussian noise to the adapted feature map $\mathcal{A}$. The noise is limited to regions generated by binarised Perlin mask and non-anomalous regions (depicted by $\tilde{\epsilon}$). The perturbed feature map $\mathcal{P}$ is then used as the input for the segmentation head to predict an anomaly mask $\mathrm{M}_o$. The predicted anomaly mask $\mathrm{M}_o$ and the perturbed feature map $\mathcal{P}$ are then used as the input for the classification head, producing the anomaly score $s$. The produced anomaly score $s$ and the predicted mask $\mathrm{M}_o$ are during the training supervised by the anomaly mask $\mathrm{M}$ and $y$, where $y$ is set to 1 if the image contains an anomaly (synthetic or real) and to 0 otherwise. During inference, the anomaly generation phase is omitted, and $\mathrm{M}_o$ and $s$ are produced directly from the adapted feature map $\mathcal{A}$.
  • Figure 3: Synthetic anomaly generation. Synthetic anomaly masks $\mathrm{M}_a$ are generated using Perlin Noise. In the unsupervised setting, Gaussian noise is added to all the regions denoted by the thresholded Perlin Noise mask $\mathrm{M}_t$. In contrast, in the supervised setting, noise is omitted from the regions with actual anomalies, denoted by $\mathrm{M}_{gt}$. The final anomaly mask $\mathrm{M}$ is constructed from $\mathrm{M}_a$ and $\mathrm{M}_{gt}$, and holds information on both where the Gaussian Noise is added and where the actual anomalies lie.
  • Figure 4: Qualitative comparison of anomaly maps produced in the supervised setting: the input image, the ground truth, and the overlaid anomaly map for SuperSimpleNet, SimpleNet, PRN, and BGAD. The first row displays two samples from KSDD2; the second and third are SensumSODF capsule and softgel respectively. The anomaly score is displayed in the top right corner of each overlaid anomaly map. The anomaly score from the classification head proves to be more reliable than the established maximum value of the anomaly mask.
  • Figure 5: Qualitative comparison of anomaly maps produced by unsupervised SuperSimpleNet and SimpleNet. The top row shows the input anomalous image. The second row displays the ground truth anomaly mask. The third and fourth rows contain anomaly maps generated by SuperSimpleNet and SimpleNet, respectively. The anomaly score is displayed in the top right corner of each anomaly map.
  • ...and 2 more figures