Table of Contents
Fetching ...

U-Flow: A U-shaped Normalizing Flow for Anomaly Detection with Unsupervised Threshold

Matías Tailanian, Álvaro Pardo, Pablo Musé

TL;DR

This paper tackles image anomaly detection and localization under fully unsupervised conditions, addressing the need for robust thresholds in real-world segmentation. It introduces U-Flow, a four-phase pipeline that fuses multi-scale Transformer features (MS-CaiT) with a fully invertible U-shaped Normalizing Flow to produce pixelwise likelihoods, followed by an a contrario-based segmentation that yields unsupervised thresholds for anomaly masks. The key contributions are: (i) a multi-scale, independently pretrained feature extractor; (ii) a novel U-shaped NF that enforces inter- and intra-scale independence; (iii) a pixelwise anomaly score based on joint likelihoods; and (iv) a principled, parameter-free thresholding via Number of False Alarms, enabling accurate segmentation without labeled anomalies. The approach achieves state-of-the-art results on MVTec-AD and shows strong generalization to other datasets, with robust performance and an openly available implementation, facilitating practical deployment where thresholds must be inferred automatically.

Abstract

In this work we propose a one-class self-supervised method for anomaly segmentation in images that benefits both from a modern machine learning approach and a more classic statistical detection theory. The method consists of four phases. First, features are extracted using a multi-scale image Transformer architecture. Then, these features are fed into a U-shaped Normalizing Flow (NF) that lays the theoretical foundations for the subsequent phases. The third phase computes a pixel-level anomaly map from the NF embedding, and the last phase performs a segmentation based on the a contrario framework. This multiple hypothesis testing strategy permits the derivation of robust unsupervised detection thresholds, which are crucial in real-world applications where an operational point is needed. The segmentation results are evaluated using the Mean Intersection over Union (mIoU) metric, and for assessing the generated anomaly maps we report the area under the Receiver Operating Characteristic curve (AUROC), as well as the Area Under the Per-Region-Overlap curve (AUPRO). Extensive experimentation in various datasets shows that the proposed approach produces state-of-the-art results for all metrics and all datasets, ranking first in most MVTec-AD categories, with a mean pixel-level AUROC of 98.74%. Code and trained models are available at https:// github.com/mtailanian/uflow.

U-Flow: A U-shaped Normalizing Flow for Anomaly Detection with Unsupervised Threshold

TL;DR

This paper tackles image anomaly detection and localization under fully unsupervised conditions, addressing the need for robust thresholds in real-world segmentation. It introduces U-Flow, a four-phase pipeline that fuses multi-scale Transformer features (MS-CaiT) with a fully invertible U-shaped Normalizing Flow to produce pixelwise likelihoods, followed by an a contrario-based segmentation that yields unsupervised thresholds for anomaly masks. The key contributions are: (i) a multi-scale, independently pretrained feature extractor; (ii) a novel U-shaped NF that enforces inter- and intra-scale independence; (iii) a pixelwise anomaly score based on joint likelihoods; and (iv) a principled, parameter-free thresholding via Number of False Alarms, enabling accurate segmentation without labeled anomalies. The approach achieves state-of-the-art results on MVTec-AD and shows strong generalization to other datasets, with robust performance and an openly available implementation, facilitating practical deployment where thresholds must be inferred automatically.

Abstract

In this work we propose a one-class self-supervised method for anomaly segmentation in images that benefits both from a modern machine learning approach and a more classic statistical detection theory. The method consists of four phases. First, features are extracted using a multi-scale image Transformer architecture. Then, these features are fed into a U-shaped Normalizing Flow (NF) that lays the theoretical foundations for the subsequent phases. The third phase computes a pixel-level anomaly map from the NF embedding, and the last phase performs a segmentation based on the a contrario framework. This multiple hypothesis testing strategy permits the derivation of robust unsupervised detection thresholds, which are crucial in real-world applications where an operational point is needed. The segmentation results are evaluated using the Mean Intersection over Union (mIoU) metric, and for assessing the generated anomaly maps we report the area under the Receiver Operating Characteristic curve (AUROC), as well as the Area Under the Per-Region-Overlap curve (AUPRO). Extensive experimentation in various datasets shows that the proposed approach produces state-of-the-art results for all metrics and all datasets, ranking first in most MVTec-AD categories, with a mean pixel-level AUROC of 98.74%. Code and trained models are available at https:// github.com/mtailanian/uflow.
Paper Structure (20 sections, 10 equations, 8 figures, 8 tables)

This paper contains 20 sections, 10 equations, 8 figures, 8 tables.

Figures (8)

  • Figure 1: Anomalies detected with the proposed approach on MVTec-AD examples from different categories. Top row: original images with ground truth segmentation. Second row: corresponding anomaly maps. Third row: automatic segmentations. Last row: ground truth masks.
  • Figure 2: The method consists of four phases. (1) Multi-scale feature extraction: a rich multi-scale representation is obtained with MS-CaiT by combining pre-trained image Transformers acting at different image scales. (2) U-shaped Normalizing Flow: by adapting the widely used U-like architecture to NFs, a fully invertible architecture is designed. This architecture is capable of merging the information from different scales while ensuring independence in both intra- and inter-scales. To make it fully invertible, split and invertible up-sampling operations are used. (3) Anomaly score map generation: an anomaly map is generated by associating a likelihood-based anomaly score to each pixel in the test image. (4) Anomaly segmentation: besides generating the anomaly map, we also propose to adapt the a contrario framework to obtain an automatic threshold by controlling the allowed number of false alarms.
  • Figure 3: Tree of connected components of the upper level sets: the hierarchical representation based on the level sets used to retrieve the most significant regions to be tested for anomaly segmentation.
  • Figure 4: Example level lines of a branch of the tree of connected components.
  • Figure 5: Example results for all MVTec categories. The first row shows the example images with the ground truth over-imposed in red. The results for FastFlow, CFlow, and CS-Flow are shown in the second, third, and fourth rows. The next two rows correspond to our method: the anomaly score defined in \ref{['ec:as-likelihood']}, and the segmentation obtained with the automatic threshold $\log (\text{NFA}) < 0$. The last row presents the segmentation masks for an easy comparison. While other methods achieve a very good performance, in some cases, they present artifacts and over-estimated anomaly scores. Our anomaly score achieves very good visual and numerical results, spotting anomalies with high confidence. Finally, the segmentation with the automatic threshold on the NFA is also able to spot and accurately segment the anomalies. All detections of these examples exhibit very low $\log(\text{NFA})$ values, ranging from -50 to -1515.
  • ...and 3 more figures