Long-Tailed Anomaly Detection with Learnable Class Names
Chih-Hui Ho, Kuan-Chuan Peng, Nuno Vasconcelos
TL;DR
The paper tackles long-tailed anomaly detection across multiple image classes without relying on fixed class names. It proposes LTAD, a unified approach that combines AD by reconstruction (transformer-based RM) with semantic AD operating in a text-space, using learnable pseudo class names to handle unknown or ambiguous class labels. LTAD is trained in two phases: Phase 1 learns the pseudo class names and a VAE-like feature synthesizer conditioned on those names; Phase 2 trains the RM and patch-projection modules using real and synthetically augmented data. Experiments on long-tailed splits of MVTec, VisA, and DAGM show LTAD outperforms state-of-the-art methods across most imbalance configurations, with ablations confirming the contributions of learnable class names, data augmentation, and the dual-score fusion.
Abstract
Anomaly detection (AD) aims to identify defective images and localize their defects (if any). Ideally, AD models should be able to detect defects over many image classes; without relying on hard-coded class names that can be uninformative or inconsistent across datasets; learn without anomaly supervision; and be robust to the long-tailed distributions of real-world applications. To address these challenges, we formulate the problem of long-tailed AD by introducing several datasets with different levels of class imbalance and metrics for performance evaluation. We then propose a novel method, LTAD, to detect defects from multiple and long-tailed classes, without relying on dataset class names. LTAD combines AD by reconstruction and semantic AD modules. AD by reconstruction is implemented with a transformer-based reconstruction module. Semantic AD is implemented with a binary classifier, which relies on learned pseudo class names and a pretrained foundation model. These modules are learned over two phases. Phase 1 learns the pseudo-class names and a variational autoencoder (VAE) for feature synthesis that augments the training data to combat long-tails. Phase 2 then learns the parameters of the reconstruction and classification modules of LTAD. Extensive experiments using the proposed long-tailed datasets show that LTAD substantially outperforms the state-of-the-art methods for most forms of dataset imbalance. The long-tailed dataset split is available at https://zenodo.org/records/10854201 .
