Table of Contents
Fetching ...

Long-Tailed Anomaly Detection with Learnable Class Names

Chih-Hui Ho, Kuan-Chuan Peng, Nuno Vasconcelos

TL;DR

The paper tackles long-tailed anomaly detection across multiple image classes without relying on fixed class names. It proposes LTAD, a unified approach that combines AD by reconstruction (transformer-based RM) with semantic AD operating in a text-space, using learnable pseudo class names to handle unknown or ambiguous class labels. LTAD is trained in two phases: Phase 1 learns the pseudo class names and a VAE-like feature synthesizer conditioned on those names; Phase 2 trains the RM and patch-projection modules using real and synthetically augmented data. Experiments on long-tailed splits of MVTec, VisA, and DAGM show LTAD outperforms state-of-the-art methods across most imbalance configurations, with ablations confirming the contributions of learnable class names, data augmentation, and the dual-score fusion.

Abstract

Anomaly detection (AD) aims to identify defective images and localize their defects (if any). Ideally, AD models should be able to detect defects over many image classes; without relying on hard-coded class names that can be uninformative or inconsistent across datasets; learn without anomaly supervision; and be robust to the long-tailed distributions of real-world applications. To address these challenges, we formulate the problem of long-tailed AD by introducing several datasets with different levels of class imbalance and metrics for performance evaluation. We then propose a novel method, LTAD, to detect defects from multiple and long-tailed classes, without relying on dataset class names. LTAD combines AD by reconstruction and semantic AD modules. AD by reconstruction is implemented with a transformer-based reconstruction module. Semantic AD is implemented with a binary classifier, which relies on learned pseudo class names and a pretrained foundation model. These modules are learned over two phases. Phase 1 learns the pseudo-class names and a variational autoencoder (VAE) for feature synthesis that augments the training data to combat long-tails. Phase 2 then learns the parameters of the reconstruction and classification modules of LTAD. Extensive experiments using the proposed long-tailed datasets show that LTAD substantially outperforms the state-of-the-art methods for most forms of dataset imbalance. The long-tailed dataset split is available at https://zenodo.org/records/10854201 .

Long-Tailed Anomaly Detection with Learnable Class Names

TL;DR

The paper tackles long-tailed anomaly detection across multiple image classes without relying on fixed class names. It proposes LTAD, a unified approach that combines AD by reconstruction (transformer-based RM) with semantic AD operating in a text-space, using learnable pseudo class names to handle unknown or ambiguous class labels. LTAD is trained in two phases: Phase 1 learns the pseudo class names and a VAE-like feature synthesizer conditioned on those names; Phase 2 trains the RM and patch-projection modules using real and synthetically augmented data. Experiments on long-tailed splits of MVTec, VisA, and DAGM show LTAD outperforms state-of-the-art methods across most imbalance configurations, with ablations confirming the contributions of learnable class names, data augmentation, and the dual-score fusion.

Abstract

Anomaly detection (AD) aims to identify defective images and localize their defects (if any). Ideally, AD models should be able to detect defects over many image classes; without relying on hard-coded class names that can be uninformative or inconsistent across datasets; learn without anomaly supervision; and be robust to the long-tailed distributions of real-world applications. To address these challenges, we formulate the problem of long-tailed AD by introducing several datasets with different levels of class imbalance and metrics for performance evaluation. We then propose a novel method, LTAD, to detect defects from multiple and long-tailed classes, without relying on dataset class names. LTAD combines AD by reconstruction and semantic AD modules. AD by reconstruction is implemented with a transformer-based reconstruction module. Semantic AD is implemented with a binary classifier, which relies on learned pseudo class names and a pretrained foundation model. These modules are learned over two phases. Phase 1 learns the pseudo-class names and a variational autoencoder (VAE) for feature synthesis that augments the training data to combat long-tails. Phase 2 then learns the parameters of the reconstruction and classification modules of LTAD. Extensive experiments using the proposed long-tailed datasets show that LTAD substantially outperforms the state-of-the-art methods for most forms of dataset imbalance. The long-tailed dataset split is available at https://zenodo.org/records/10854201 .
Paper Structure (19 sections, 8 equations, 9 figures, 49 tables)

This paper contains 19 sections, 8 equations, 9 figures, 49 tables.

Figures (9)

  • Figure 1: Challenges of long-tailed AD include (Left) designing a single model to detect anomalies over multiple image classes, (Middle) uninformative class names, and (Right) long-tailed data distributions.
  • Figure 2: Preliminary study with UniAD on MVTec. Image classes (x-axis) are sorted by popularity. (a) Dataset distribution of MVTec vs. long-tailed version. (b) AD performance on the two datasets.
  • Figure 3: The LTAD architecture combines AD by reconstruction and semantic AD scores (${\cal S}_{rec}$ and ${\cal S}_{sem}$, respectively), implemented by the RM and SAD modules. We use an image $E$ and text encoder $T$ from a pretrained foundation model to extract images features $f^{real}$ and text features $t_{n,c}, t_{a,c}$ derived from text-prompts that include a static component to discriminate between normal ($v^{n}$) and abnormal ($v^{n}$) and a learned component $s_c$ to make this discrimination class-sensitive.
  • Figure 4: Phase 1 of LTAD training learns a VAE-style decoder $D$ for feature augmentation conditioned on a learned pseudo class name $s_c$.
  • Figure 5: Phase 2 of LTAD training learns the parameters of the reconstruction module (RM) and patch projections $\Phi_l$ that map visual features into the semantic space of the semantic AD (SAD) module.
  • ...and 4 more figures