Test Time Training for Industrial Anomaly Segmentation

Alex Costanzino; Pierluigi Zama Ramirez; Mirko Del Moro; Agostino Aiezzo; Giuseppe Lisanti; Samuele Salti; Luigi Di Stefano

Test Time Training for Industrial Anomaly Segmentation

Alex Costanzino, Pierluigi Zama Ramirez, Mirko Del Moro, Agostino Aiezzo, Giuseppe Lisanti, Samuele Salti, Luigi Di Stefano

TL;DR

This paper tackles the problem of producing accurate binary anomaly segmentations in industrial AD&S by introducing Test Time Training for Anomaly Segmentation (TTT4AS). By leveraging a per-image SVM trained at test time on features from a frozen, general-purpose extractor $\mathcal{F}$ and pseudo-labels derived from the anomaly score map $\Psi$, TTT4AS converts raw anomaly scores into sharper, dense binary masks $\overline{\Psi}$ without retraining the base model. The method is designed to be downstream of any AD&S technique and is demonstrated on RGB and multimodal (RGB+3D) benchmarks (MVTec AD and MVTec 3D-AD) across multiple backbones (e.g., WideResNet-50, DINO-v2, Point-MAE). Key findings show substantial improvements in segmentation metrics (notably F1) over standard threshold-based binarization, with robustness to percentile thresholds and applicability to memory-bank and reconstruction-based AD&S methods, indicating significant practical impact for industrial quality control. The approach remains general, scalable at test time, and offers a concrete path to better defect localization without requiring anomalous training data. $\Psi$, $\mathcal{F}$, $\overline{F}$, and $\overline{\Psi}$ appear throughout to denote anomaly scores, feature extractors, upsampled features, and the final binary maps, respectively.

Abstract

Anomaly Detection and Segmentation (AD&S) is crucial for industrial quality control. While existing methods excel in generating anomaly scores for each pixel, practical applications require producing a binary segmentation to identify anomalies. Due to the absence of labeled anomalies in many real scenarios, standard practices binarize these maps based on some statistics derived from a validation set containing only nominal samples, resulting in poor segmentation performance. This paper addresses this problem by proposing a test time training strategy to improve the segmentation performance. Indeed, at test time, we can extract rich features directly from anomalous samples to train a classifier that can discriminate defects effectively. Our general approach can work downstream to any AD&S method that provides an anomaly score map as output, even in multimodal settings. We demonstrate the effectiveness of our approach over baselines through extensive experimentation and evaluation on MVTec AD and MVTec 3D-AD.

Test Time Training for Industrial Anomaly Segmentation

TL;DR

and pseudo-labels derived from the anomaly score map

, TTT4AS converts raw anomaly scores into sharper, dense binary masks

without retraining the base model. The method is designed to be downstream of any AD&S technique and is demonstrated on RGB and multimodal (RGB+3D) benchmarks (MVTec AD and MVTec 3D-AD) across multiple backbones (e.g., WideResNet-50, DINO-v2, Point-MAE). Key findings show substantial improvements in segmentation metrics (notably F1) over standard threshold-based binarization, with robustness to percentile thresholds and applicability to memory-bank and reconstruction-based AD&S methods, indicating significant practical impact for industrial quality control. The approach remains general, scalable at test time, and offers a concrete path to better defect localization without requiring anomalous training data.

, and

appear throughout to denote anomaly scores, feature extractors, upsampled features, and the final binary maps, respectively.

Abstract

Paper Structure (6 sections, 1 equation, 6 figures, 6 tables)

This paper contains 6 sections, 1 equation, 6 figures, 6 tables.

Introduction
Related Works
Method
Experimental Settings
Experimental Results
Limitations & Conclusion

Figures (6)

Figure 1: Binary anomaly segmentation maps of anomalous samples. Our approach, TTT4AS, enhances the quality of the binary anomaly segmentation masks. TTT4AS can be applied downstream to any anomaly detection and segmentation method that provides an anomaly score. The column THR represents the output of our baseline, a binarization obtained by computing a threshold based on the score statistics on a validation set, which contains only nominal samples.
Figure 2: TTT4AS Overview. Given a single test input $I$ such as an RGB image, a feature extractor $\mathcal{F}$ extracts a feature map, while an AD&S method predicts an anomaly score map $\Psi$. Then, exploiting $\Psi$, we create pseudo-labels for a sparse subset of points. Pseudo-labels and the corresponding features are employed as training data for an SVM Classifier. Finally, the trained SVM processes the dense feature map of the same test sample to predict a binary anomaly map $\overline{\Psi}$.
Figure 3: Pseudo-labels Selection. Starting from an anomaly score map (top left) all local maxima are computed by neighbouring values comparison (top right). Then, the peaks above a certain percentile (gray plane, bottom left) are kept while the others are suppressed. Finally, the non-suppressed maxima are enriched with their spatial neighbouring points (purple spheres, bottom right) and labeled as anomalous.
Figure 4: Test Time Training. The binary classifier is trained on both easy and hard samples for both classes, retrieved thanks to the aforementioned pseudo-labeling procedure.
Figure 5: MVTec AD Qualitative Results. We show for each class: RGB, ground truth followed by anomaly score, binary segmentation maps with thresholding, binary segmentation maps with TTT4AS for PatchCore patchcore2022roth with backbone WideResNet50 wideresnet and DINO-v2 oquab2023dinov2
...and 1 more figures

Test Time Training for Industrial Anomaly Segmentation

TL;DR

Abstract

Test Time Training for Industrial Anomaly Segmentation

Authors

TL;DR

Abstract

Table of Contents

Figures (6)