Table of Contents
Fetching ...

A SAM-guided Two-stream Lightweight Model for Anomaly Detection

Chenghao Li, Lei Qi, Xin Geng

TL;DR

A SAM-guided Two-stream Lightweight Model for unsupervised anomaly detection (STLM) that not only aligns with the two practical application requirements but also harnesses the robust generalization capabilities of SAM.

Abstract

In industrial anomaly detection, model efficiency and mobile-friendliness become the primary concerns in real-world applications. Simultaneously, the impressive generalization capabilities of Segment Anything (SAM) have garnered broad academic attention, making it an ideal choice for localizing unseen anomalies and diverse real-world patterns. In this paper, considering these two critical factors, we propose a SAM-guided Two-stream Lightweight Model for unsupervised anomaly detection (STLM) that not only aligns with the two practical application requirements but also harnesses the robust generalization capabilities of SAM. We employ two lightweight image encoders, i.e., our two-stream lightweight module, guided by SAM's knowledge. To be specific, one stream is trained to generate discriminative and general feature representations in both normal and anomalous regions, while the other stream reconstructs the same images without anomalies, which effectively enhances the differentiation of two-stream representations when facing anomalous regions. Furthermore, we employ a shared mask decoder and a feature aggregation module to generate anomaly maps. Our experiments conducted on MVTec AD benchmark show that STLM, with about 16M parameters and achieving an inference time in 20ms, competes effectively with state-of-the-art methods in terms of performance, 98.26% on pixel-level AUC and 94.92% on PRO. We further experiment on more difficult datasets, e.g., VisA and DAGM, to demonstrate the effectiveness and generalizability of STLM.

A SAM-guided Two-stream Lightweight Model for Anomaly Detection

TL;DR

A SAM-guided Two-stream Lightweight Model for unsupervised anomaly detection (STLM) that not only aligns with the two practical application requirements but also harnesses the robust generalization capabilities of SAM.

Abstract

In industrial anomaly detection, model efficiency and mobile-friendliness become the primary concerns in real-world applications. Simultaneously, the impressive generalization capabilities of Segment Anything (SAM) have garnered broad academic attention, making it an ideal choice for localizing unseen anomalies and diverse real-world patterns. In this paper, considering these two critical factors, we propose a SAM-guided Two-stream Lightweight Model for unsupervised anomaly detection (STLM) that not only aligns with the two practical application requirements but also harnesses the robust generalization capabilities of SAM. We employ two lightweight image encoders, i.e., our two-stream lightweight module, guided by SAM's knowledge. To be specific, one stream is trained to generate discriminative and general feature representations in both normal and anomalous regions, while the other stream reconstructs the same images without anomalies, which effectively enhances the differentiation of two-stream representations when facing anomalous regions. Furthermore, we employ a shared mask decoder and a feature aggregation module to generate anomaly maps. Our experiments conducted on MVTec AD benchmark show that STLM, with about 16M parameters and achieving an inference time in 20ms, competes effectively with state-of-the-art methods in terms of performance, 98.26% on pixel-level AUC and 94.92% on PRO. We further experiment on more difficult datasets, e.g., VisA and DAGM, to demonstrate the effectiveness and generalizability of STLM.
Paper Structure (38 sections, 9 equations, 9 figures, 9 tables)

This paper contains 38 sections, 9 equations, 9 figures, 9 tables.

Figures (9)

  • Figure 1: Comparisons of different anomaly detection methods in terms of pixel-level AUROC (vertical axis), inference time (horizontal axis), and the ratios of parameter numbers (circle radius). Our STLM achieves competitive pixel-level AUROC for anomaly detection while being 8× faster than PatchCore, 4× faster than FOD which achieves the highest pixel-level AUROC, 1× faster than SimpleNet, and 0.5× faster than RD++ (154.87M). In addition, STLM requires only 16.56M of parameters for inference, making it one of the most efficient methods.
  • Figure 2: Overview of STLM. Pseudo anomalies are introduced into normal training images with predefined probabilities and are used exclusively during training. The trainable two-stream lightweight model (TLM) distills different information from a fixed SAM teacher. One stream is trained to generate discriminative and generalized feature representations in both normal and anomalous regions, while the other to match its feature of the same images without corruption, which enhances the differentiation of their representations when addressing anomalous regions. The element-wise product of the TLM is employed to train the feature aggregation module with the generated binary anomaly mask. For inference, anomaly maps are generated only using the TLM and FA module.
  • Figure 3: Visualization examples of VisA zou2022spot, MVTec LOCO Loco2022, DAGM zavrtanik2021draem and MVTec bergmann2019mvtec.
  • Figure 4: Visualization examples of our method on MVTec. The Feature Aggregation (FA) module is always effective.
  • Figure 5: Visualization examples of our method on VisA.
  • ...and 4 more figures