Table of Contents
Fetching ...

Explainable Visual Anomaly Detection via Concept Bottleneck Models

Arianna Stropeni, Valentina Zaccaria, Francesco Borsatti, Davide Dalle Pezze, Manuel Barusco, Gian Antonio Susto

TL;DR

This paper addresses the interpretability gap in Visual Anomaly Detection (VAD) by adapting Concept Bottleneck Models (CBMs) to learn and utilize human-understandable concepts for anomaly predictions. It introduces CONVAD, which couples a concept extractor $g: \mathcal{X} \rightarrow \mathcal{C}$ with a predictor $f: \mathcal{C} \rightarrow \mathcal{Y}$ so that predictions $\hat{y}=f(g(\mathbf{x}))$ are explainable via intermediate concepts, and supports test-time interventions on $\hat{c}$ to improve accuracy. The approach adds a Concept Dataset Pipeline to automatically annotate industrial images, a Visual Explanation module via a student-teacher distillation to localize anomalies, and a Synthetic Anomaly Generation (SAG) pipeline to maintain the unsupervised nature of VAD. Empirical results on the MVTec dataset show CONVAD achieving competitive image- and pixel-level performance while providing richer, concept-driven explanations; interventions on concepts further boost performance, highlighting the value of human-in-the-loop control in VAD. The work suggests directions for future enhancement of novelty detection and refined synthetic anomaly generation to further close gaps between synthetic and real anomaly distributions.

Abstract

In recent years, Visual Anomaly Detection (VAD) has gained significant attention due to its ability to identify anomalous images using only normal images during training. Many VAD models work without supervision but are still able to provide visual explanations by highlighting the anomalous regions within an image. However, although these visual explanations can be helpful, they lack a direct and semantically meaningful interpretation for users. To address this limitation, we propose extending Concept Bottleneck Models (CBMs) to the VAD setting. By learning meaningful concepts, the network can provide human-interpretable descriptions of anomalies, offering a novel and more insightful way to explain them. Our contributions are threefold: (i) we develop a Concept Dataset to support research on CBMs for VAD; (ii) we improve the CBM architecture to generate both concept-based and visual explanations, bridging semantic and localization interpretability; and (iii) we introduce a pipeline for synthesizing artificial anomalies, preserving the VAD paradigm of minimizing dependence on rare anomalous samples. Our approach, Concept-Aware Visual Anomaly Detection (CONVAD), achieves performance comparable to classic VAD methods while providing richer, concept-driven explanations that enhance interpretability and trust in VAD systems.

Explainable Visual Anomaly Detection via Concept Bottleneck Models

TL;DR

This paper addresses the interpretability gap in Visual Anomaly Detection (VAD) by adapting Concept Bottleneck Models (CBMs) to learn and utilize human-understandable concepts for anomaly predictions. It introduces CONVAD, which couples a concept extractor with a predictor so that predictions are explainable via intermediate concepts, and supports test-time interventions on to improve accuracy. The approach adds a Concept Dataset Pipeline to automatically annotate industrial images, a Visual Explanation module via a student-teacher distillation to localize anomalies, and a Synthetic Anomaly Generation (SAG) pipeline to maintain the unsupervised nature of VAD. Empirical results on the MVTec dataset show CONVAD achieving competitive image- and pixel-level performance while providing richer, concept-driven explanations; interventions on concepts further boost performance, highlighting the value of human-in-the-loop control in VAD. The work suggests directions for future enhancement of novelty detection and refined synthetic anomaly generation to further close gaps between synthetic and real anomaly distributions.

Abstract

In recent years, Visual Anomaly Detection (VAD) has gained significant attention due to its ability to identify anomalous images using only normal images during training. Many VAD models work without supervision but are still able to provide visual explanations by highlighting the anomalous regions within an image. However, although these visual explanations can be helpful, they lack a direct and semantically meaningful interpretation for users. To address this limitation, we propose extending Concept Bottleneck Models (CBMs) to the VAD setting. By learning meaningful concepts, the network can provide human-interpretable descriptions of anomalies, offering a novel and more insightful way to explain them. Our contributions are threefold: (i) we develop a Concept Dataset to support research on CBMs for VAD; (ii) we improve the CBM architecture to generate both concept-based and visual explanations, bridging semantic and localization interpretability; and (iii) we introduce a pipeline for synthesizing artificial anomalies, preserving the VAD paradigm of minimizing dependence on rare anomalous samples. Our approach, Concept-Aware Visual Anomaly Detection (CONVAD), achieves performance comparable to classic VAD methods while providing richer, concept-driven explanations that enhance interpretability and trust in VAD systems.

Paper Structure

This paper contains 22 sections, 6 equations, 7 figures, 7 tables.

Figures (7)

  • Figure 1: In a fully supervised setting, CONVAD achieves performance comparable to established VAD models while additionally offering interpretable, concept-based explanations. Additionally, it shows competitive performance also in settings where anomalous images are scarce.
  • Figure 2: Pipeline for creating the Concept Dataset through concept annotation of a VLM.
  • Figure 3: CONVAD Architecture with the i) CBM Module and the ii) Vision Module.
  • Figure 4: Gain in performance for the three CBM paradigms while increasing the number of concepts we intervene on, on the screw category. The horizontal line indicates the baseline performance training the feature extractor directly to perform anomaly detection, without the concept prediction bottleneck.
  • Figure 5: Examples of well-generated synthetic anomalous images.
  • ...and 2 more figures