Table of Contents
Fetching ...

InCTRLv2: Generalist Residual Models for Few-Shot Anomaly Detection and Segmentation

Jiawen Zhu, Mengjia Niu, Guansong Pang

Abstract

While recent anomaly detection (AD) methods have made substantial progress in recognizing abnormal patterns within specific domains, most of them are specialist models that are trained on large training samples from a specific target dataset, struggling to generalize to unseen datasets. To address this limitation, the paradigm of Generalist Anomaly Detection (GAD) has emerged in recent years, aiming to learn a single generalist model to detect anomalies across diverse domains without retraining. To this end, this work introduces InCTRLv2, a novel few-shot Generalist Anomaly Detection and Segmentation (GADS) framework that significantly extends our previously proposed GAD model, InCTRL. Building on the idea of learning in-context residuals with few-shot normal examples to detect anomalies as in InCTRL, InCTRLv2 introduces two new, complementary perspectives of anomaly perception under a dual-branch framework. This is accomplished by two novel modules upon InCTRL: i) Discriminative Anomaly Score Learning (DASL) with both normal and abnormal data in the main branch, which learns a semantic-guided abnormality and normality space that supports the classification of query samples from both the abnormality and normality perspectives; and ii) One-class Anomaly Score Learning (OASL) using only the normal data, which learns generalized normality patterns in a semantic space via an auxiliary branch, focusing on detecting anomalies through the lens of normality solely. Both branches are guided by rich visual-text semantic priors encoded by large-scale vision-language models. Together, they offer a dual semantic perspective for AD: one emphasizes normal-abnormal discriminations, while the other emphasizes normality-deviated semantics. Extensive experiments on ten AD datasets demonstrate that InCTRLv2 achieves SotA performance in both anomaly detection and segmentation tasks across various settings.

InCTRLv2: Generalist Residual Models for Few-Shot Anomaly Detection and Segmentation

Abstract

While recent anomaly detection (AD) methods have made substantial progress in recognizing abnormal patterns within specific domains, most of them are specialist models that are trained on large training samples from a specific target dataset, struggling to generalize to unseen datasets. To address this limitation, the paradigm of Generalist Anomaly Detection (GAD) has emerged in recent years, aiming to learn a single generalist model to detect anomalies across diverse domains without retraining. To this end, this work introduces InCTRLv2, a novel few-shot Generalist Anomaly Detection and Segmentation (GADS) framework that significantly extends our previously proposed GAD model, InCTRL. Building on the idea of learning in-context residuals with few-shot normal examples to detect anomalies as in InCTRL, InCTRLv2 introduces two new, complementary perspectives of anomaly perception under a dual-branch framework. This is accomplished by two novel modules upon InCTRL: i) Discriminative Anomaly Score Learning (DASL) with both normal and abnormal data in the main branch, which learns a semantic-guided abnormality and normality space that supports the classification of query samples from both the abnormality and normality perspectives; and ii) One-class Anomaly Score Learning (OASL) using only the normal data, which learns generalized normality patterns in a semantic space via an auxiliary branch, focusing on detecting anomalies through the lens of normality solely. Both branches are guided by rich visual-text semantic priors encoded by large-scale vision-language models. Together, they offer a dual semantic perspective for AD: one emphasizes normal-abnormal discriminations, while the other emphasizes normality-deviated semantics. Extensive experiments on ten AD datasets demonstrate that InCTRLv2 achieves SotA performance in both anomaly detection and segmentation tasks across various settings.

Paper Structure

This paper contains 35 sections, 17 equations, 6 figures, 6 tables.

Figures (6)

  • Figure 1: Illustration of the Generalist Anomaly Detection and Segmentation (GADS) paradigm for the few-shot AD task. A model is trained on auxiliary datasets and leverages few-shot normal images as in-context sample prompts paired with query samples (Top). It can then be directly applied to diverse target datasets from different domains without requiring domain-specific retraining (Bottom).
  • Figure 2: Overview of the training process of InCTRLv2. It extends the single-branch residual learning framework of InCTRL into a dual-branch architecture, consisting of a main branch and an auxiliary branch. The main branch employs a Discriminative Anomaly Score Learning (DASL) module to learn a semantic-guided decision space that jointly models abnormality and normality, enabling anomaly discrimination from both perspectives. In parallel, the auxiliary branch adopts a One-class Anomaly Score Learning (OASL) module, which is trained exclusively on normal samples to capture generalized normality patterns in the semantic space. Together, these two branches complement each other by combining discriminative abnormality modeling with normality-driven guidance.
  • Figure 3: Detailed illustration of the image-level in-context learning and the multi-layer patch-level in-context learning mechanisms in the DASL module.
  • Figure 4: Left: Image-level AUROC results based on different value of $\alpha$. Right: Pixel-level AUROC results based on different value of $\beta$.
  • Figure 5: Visualization of anomaly maps generated by DASL module ($\mathbf{M}_p$) alone and InCTRLv2 ($\mathbf{M}$). The anomaly score maps are generated under the 'VisA to MVTecAD' setting.
  • ...and 1 more figures