Test Time Training for Industrial Anomaly Segmentation
Alex Costanzino, Pierluigi Zama Ramirez, Mirko Del Moro, Agostino Aiezzo, Giuseppe Lisanti, Samuele Salti, Luigi Di Stefano
TL;DR
This paper tackles the problem of producing accurate binary anomaly segmentations in industrial AD&S by introducing Test Time Training for Anomaly Segmentation (TTT4AS). By leveraging a per-image SVM trained at test time on features from a frozen, general-purpose extractor $\mathcal{F}$ and pseudo-labels derived from the anomaly score map $\Psi$, TTT4AS converts raw anomaly scores into sharper, dense binary masks $\overline{\Psi}$ without retraining the base model. The method is designed to be downstream of any AD&S technique and is demonstrated on RGB and multimodal (RGB+3D) benchmarks (MVTec AD and MVTec 3D-AD) across multiple backbones (e.g., WideResNet-50, DINO-v2, Point-MAE). Key findings show substantial improvements in segmentation metrics (notably F1) over standard threshold-based binarization, with robustness to percentile thresholds and applicability to memory-bank and reconstruction-based AD&S methods, indicating significant practical impact for industrial quality control. The approach remains general, scalable at test time, and offers a concrete path to better defect localization without requiring anomalous training data. $\Psi$, $\mathcal{F}$, $\overline{F}$, and $\overline{\Psi}$ appear throughout to denote anomaly scores, feature extractors, upsampled features, and the final binary maps, respectively.
Abstract
Anomaly Detection and Segmentation (AD&S) is crucial for industrial quality control. While existing methods excel in generating anomaly scores for each pixel, practical applications require producing a binary segmentation to identify anomalies. Due to the absence of labeled anomalies in many real scenarios, standard practices binarize these maps based on some statistics derived from a validation set containing only nominal samples, resulting in poor segmentation performance. This paper addresses this problem by proposing a test time training strategy to improve the segmentation performance. Indeed, at test time, we can extract rich features directly from anomalous samples to train a classifier that can discriminate defects effectively. Our general approach can work downstream to any AD&S method that provides an anomaly score map as output, even in multimodal settings. We demonstrate the effectiveness of our approach over baselines through extensive experimentation and evaluation on MVTec AD and MVTec 3D-AD.
