Table of Contents
Fetching ...

Omni-AD: Learning to Reconstruct Global and Local Features for Multi-class Anomaly Detection

Jiajie Quan, Ao Tong, Yuxuan Cai, Xinwei He, Yulong Wang, Yang Zhou

TL;DR

Omni-AD targets multi-class unsupervised anomaly detection by learning both global and local normal patterns to avoid reconstruction shortcuts. It introduces Omni-block, a two-branch decoder that combines a global attention pathway with learnable tokens and a local depthwise convolution pathway to reconstruct normal features across scales. The approach achieves state-of-the-art results on MVTec-AD, VisA, and Real-IAD, with ablations validating the effectiveness of learnable tokens, the necessity of both branches, and the chosen architectural depths. This work offers a practical, efficient, unified MUAD solution with potential applications beyond industrial anomaly detection.

Abstract

In multi-class unsupervised anomaly detection(MUAD), reconstruction-based methods learn to map input images to normal patterns to identify anomalous pixels. However, this strategy easily falls into the well-known "learning shortcut" issue when decoders fail to capture normal patterns and reconstruct both normal and abnormal samples naively. To address that, we propose to learn the input features in global and local manners, forcing the network to memorize the normal patterns more comprehensively. Specifically, we design a two-branch decoder block, named Omni-block. One branch corresponds to global feature learning, where we serialize two self-attention blocks but replace the query and (key, value) with learnable tokens, respectively, thus capturing global features of normal patterns concisely and thoroughly. The local branch comprises depth-separable convolutions, whose locality enables effective and efficient learning of local features for normal patterns. By stacking Omni-blocks, we build a framework, Omni-AD, to learn normal patterns of different granularity and reconstruct them progressively. Comprehensive experiments on public anomaly detection benchmarks show that our method outperforms state-of-the-art approaches in MUAD. Code is available at https://github.com/easyoo/Omni-AD.git

Omni-AD: Learning to Reconstruct Global and Local Features for Multi-class Anomaly Detection

TL;DR

Omni-AD targets multi-class unsupervised anomaly detection by learning both global and local normal patterns to avoid reconstruction shortcuts. It introduces Omni-block, a two-branch decoder that combines a global attention pathway with learnable tokens and a local depthwise convolution pathway to reconstruct normal features across scales. The approach achieves state-of-the-art results on MVTec-AD, VisA, and Real-IAD, with ablations validating the effectiveness of learnable tokens, the necessity of both branches, and the chosen architectural depths. This work offers a practical, efficient, unified MUAD solution with potential applications beyond industrial anomaly detection.

Abstract

In multi-class unsupervised anomaly detection(MUAD), reconstruction-based methods learn to map input images to normal patterns to identify anomalous pixels. However, this strategy easily falls into the well-known "learning shortcut" issue when decoders fail to capture normal patterns and reconstruct both normal and abnormal samples naively. To address that, we propose to learn the input features in global and local manners, forcing the network to memorize the normal patterns more comprehensively. Specifically, we design a two-branch decoder block, named Omni-block. One branch corresponds to global feature learning, where we serialize two self-attention blocks but replace the query and (key, value) with learnable tokens, respectively, thus capturing global features of normal patterns concisely and thoroughly. The local branch comprises depth-separable convolutions, whose locality enables effective and efficient learning of local features for normal patterns. By stacking Omni-blocks, we build a framework, Omni-AD, to learn normal patterns of different granularity and reconstruct them progressively. Comprehensive experiments on public anomaly detection benchmarks show that our method outperforms state-of-the-art approaches in MUAD. Code is available at https://github.com/easyoo/Omni-AD.git

Paper Structure

This paper contains 12 sections, 6 equations, 3 figures, 7 tables.

Figures (3)

  • Figure 1: Overview of Omni-AD. Given an input image, we first use a pretrained network to extract its multi-scale features. Then, we fuse them with the feature fusion neck. Finally, we feed it into the decoder comprising a series of Omni-blocks to reconstruct multi-scale features progressively.
  • Figure 2: Detailed structure of the proposed Omni-block.
  • Figure 3: Qualitative results on MVTec and VisA.