TransFusion -- A Transparency-Based Diffusion Model for Anomaly Detection
Matic Fučka, Vitjan Zavrtanik, Danijel Skočaj
TL;DR
This work tackles surface anomaly detection in manufacturing, where prior two-stage discriminative pipelines struggle due to reconstruction errors and loss of detail. It introduces TransFusion, a transparency-based diffusion framework that iteratively erases anomalies by increasing their transparency while leveraging localization cues to preserve normal regions, effectively combining reconstruction and localization in a single process. Key contributions include the transparency-based diffusion model, a ResUNet-based architecture with three heads for anomaly appearance, mask, and normal appearance, a synthetic anomaly generation pipeline, and a robust final mask fusion strategy that blends discriminative and reconstructive cues. On VisA and MVTec AD, TransFusion achieves state-of-the-art image-level AUROCs ($98.5\%$ and $99.2\%$ respectively) and an average AUROC of $98.9\%$ across both datasets, with strong localization (AUPRO) and qualitative improvements in mask precision and reconstruction fidelity, demonstrating the value of task-specific diffusion for anomaly detection.
Abstract
Surface anomaly detection is a vital component in manufacturing inspection. Current discriminative methods follow a two-stage architecture composed of a reconstructive network followed by a discriminative network that relies on the reconstruction output. Currently used reconstructive networks often produce poor reconstructions that either still contain anomalies or lack details in anomaly-free regions. Discriminative methods are robust to some reconstructive network failures, suggesting that the discriminative network learns a strong normal appearance signal that the reconstructive networks miss. We reformulate the two-stage architecture into a single-stage iterative process that allows the exchange of information between the reconstruction and localization. We propose a novel transparency-based diffusion process where the transparency of anomalous regions is progressively increased, restoring their normal appearance accurately while maintaining the appearance of anomaly-free regions using localization cues of previous steps. We implement the proposed process as TRANSparency DifFUSION (TransFusion), a novel discriminative anomaly detection method that achieves state-of-the-art performance on both the VisA and the MVTec AD datasets, with an image-level AUROC of 98.5% and 99.2%, respectively. Code: https://github.com/MaticFuc/ECCV_TransFusion
