Table of Contents
Fetching ...

TransFusion -- A Transparency-Based Diffusion Model for Anomaly Detection

Matic Fučka, Vitjan Zavrtanik, Danijel Skočaj

TL;DR

This work tackles surface anomaly detection in manufacturing, where prior two-stage discriminative pipelines struggle due to reconstruction errors and loss of detail. It introduces TransFusion, a transparency-based diffusion framework that iteratively erases anomalies by increasing their transparency while leveraging localization cues to preserve normal regions, effectively combining reconstruction and localization in a single process. Key contributions include the transparency-based diffusion model, a ResUNet-based architecture with three heads for anomaly appearance, mask, and normal appearance, a synthetic anomaly generation pipeline, and a robust final mask fusion strategy that blends discriminative and reconstructive cues. On VisA and MVTec AD, TransFusion achieves state-of-the-art image-level AUROCs ($98.5\%$ and $99.2\%$ respectively) and an average AUROC of $98.9\%$ across both datasets, with strong localization (AUPRO) and qualitative improvements in mask precision and reconstruction fidelity, demonstrating the value of task-specific diffusion for anomaly detection.

Abstract

Surface anomaly detection is a vital component in manufacturing inspection. Current discriminative methods follow a two-stage architecture composed of a reconstructive network followed by a discriminative network that relies on the reconstruction output. Currently used reconstructive networks often produce poor reconstructions that either still contain anomalies or lack details in anomaly-free regions. Discriminative methods are robust to some reconstructive network failures, suggesting that the discriminative network learns a strong normal appearance signal that the reconstructive networks miss. We reformulate the two-stage architecture into a single-stage iterative process that allows the exchange of information between the reconstruction and localization. We propose a novel transparency-based diffusion process where the transparency of anomalous regions is progressively increased, restoring their normal appearance accurately while maintaining the appearance of anomaly-free regions using localization cues of previous steps. We implement the proposed process as TRANSparency DifFUSION (TransFusion), a novel discriminative anomaly detection method that achieves state-of-the-art performance on both the VisA and the MVTec AD datasets, with an image-level AUROC of 98.5% and 99.2%, respectively. Code: https://github.com/MaticFuc/ECCV_TransFusion

TransFusion -- A Transparency-Based Diffusion Model for Anomaly Detection

TL;DR

This work tackles surface anomaly detection in manufacturing, where prior two-stage discriminative pipelines struggle due to reconstruction errors and loss of detail. It introduces TransFusion, a transparency-based diffusion framework that iteratively erases anomalies by increasing their transparency while leveraging localization cues to preserve normal regions, effectively combining reconstruction and localization in a single process. Key contributions include the transparency-based diffusion model, a ResUNet-based architecture with three heads for anomaly appearance, mask, and normal appearance, a synthetic anomaly generation pipeline, and a robust final mask fusion strategy that blends discriminative and reconstructive cues. On VisA and MVTec AD, TransFusion achieves state-of-the-art image-level AUROCs ( and respectively) and an average AUROC of across both datasets, with strong localization (AUPRO) and qualitative improvements in mask precision and reconstruction fidelity, demonstrating the value of task-specific diffusion for anomaly detection.

Abstract

Surface anomaly detection is a vital component in manufacturing inspection. Current discriminative methods follow a two-stage architecture composed of a reconstructive network followed by a discriminative network that relies on the reconstruction output. Currently used reconstructive networks often produce poor reconstructions that either still contain anomalies or lack details in anomaly-free regions. Discriminative methods are robust to some reconstructive network failures, suggesting that the discriminative network learns a strong normal appearance signal that the reconstructive networks miss. We reformulate the two-stage architecture into a single-stage iterative process that allows the exchange of information between the reconstruction and localization. We propose a novel transparency-based diffusion process where the transparency of anomalous regions is progressively increased, restoring their normal appearance accurately while maintaining the appearance of anomaly-free regions using localization cues of previous steps. We implement the proposed process as TRANSparency DifFUSION (TransFusion), a novel discriminative anomaly detection method that achieves state-of-the-art performance on both the VisA and the MVTec AD datasets, with an image-level AUROC of 98.5% and 99.2%, respectively. Code: https://github.com/MaticFuc/ECCV_TransFusion
Paper Structure (22 sections, 11 equations, 11 figures, 9 tables)

This paper contains 22 sections, 11 equations, 11 figures, 9 tables.

Figures (11)

  • Figure 1: a) Different than previous discriminative approaches, the proposed approach simultaneously reconstructs and localizes the anomalies through an iterative process, which results in a more potent normality model capable of detecting harder near-distribution anomalies. b) The reformulated diffusion model iteratively erases the anomalous regions during the reverse process. Training on synthetic anomalies (top) generalizes well to real anomalies (marked with red circles) seen at inference (bottom), leading to accurate output masks $M_{final}$ that closely match the ground truth $M_{true}$.
  • Figure 1: Failure case results. The anomalous images are shown in the first row, the overlay in the second row, the reconstructions in the third row, the predicted mask, and the real mask in the fourth and fifth rows, respectively. The biggest discrepancies between the predicted and ground truth masks are marked with red circles.
  • Figure 2: TransFusion's training and inference pipelines. Training examples are created from normal images $x$ by generating the anomaly mask $M$ and the anomaly appearance $\epsilon$ and imposing them on $x$ according to the transparency schedule $\beta_t$. The resulting image $x_t$ contains synthetic anomalies. TransFusion is guided by an augmented mask $M_a$. TransFusion outputs the estimated anomaly mask $M_t$, the anomaly appearance $\epsilon_t$, and the normal appearance $n_t$. At inference, TransFusion infers $M_t$, $\epsilon_t$, and $n_t$ from the input image and constructs the next step image according to Eq. \ref{['eq:next_step_eq']}. The predicted mask $M_t$ and the constructed $x_{t-1}$ are used as the input in the next step.
  • Figure 2: Average AUROC for different weights $\lambda$ in the final mask calculation. The maximum point of each line is represented with a dot.
  • Figure 3: TransFusion inference. For every fourth timestep, the input image $x_t$ and the predictions for the mask $M_t$, anomaly appearance $\epsilon_t$ and normal appearance $n_t$ are shown. As seen in the top row, TransFusion first reconstructs larger anomalies and inpaints the details near the end of the reconstruction process.
  • ...and 6 more figures