Research and application of Transformer based anomaly detection model: A literature review

Mingrui Ma; Lansheng Han; Chunjie Zhou

Research and application of Transformer based anomaly detection model: A literature review

Mingrui Ma, Lansheng Han, Chunjie Zhou

TL;DR

This survey defines and unifies Transformer-based anomaly detection across domains, outlining a taxonomy of Vanilla Transformer, ViT/DeiT, TabTransformer, Swin, Informer, Conformer, Performer, Set Transformer, BERT variants, and hybrid models. It maps these architectures to application areas (logs, images, video, time series, flow) and catalogs datasets and evaluation metrics to enable cross-study benchmarking. The authors identify core challenges—data imbalance, interpretability, multi-class settings, OOD/ multi-modal detection, and efficiency—and propose forward-looking directions such as zero-shot and life-long learning, explainable and contrastive/self-supervised approaches, and broader multi-modal applications. Overall, the review aims to guide researchers toward more robust, scalable, and transferable Transformer-based anomaly detection systems with practical impact in diverse domains.

Abstract

Transformer, as one of the most advanced neural network models in Natural Language Processing (NLP), exhibits diverse applications in the field of anomaly detection. To inspire research on Transformer-based anomaly detection, this review offers a fresh perspective on the concept of anomaly detection. We explore the current challenges of anomaly detection and provide detailed insights into the operating principles of Transformer and its variants in anomaly detection tasks. Additionally, we delineate various application scenarios for Transformer-based anomaly detection models and discuss the datasets and evaluation metrics employed. Furthermore, this review highlights the key challenges in Transformer-based anomaly detection research and conducts a comprehensive analysis of future research trends in this domain. The review includes an extensive compilation of over 100 core references related to Transformer-based anomaly detection. To the best of our knowledge, this is the first comprehensive review that focuses on the research related to Transformer in the context of anomaly detection. We hope that this paper can provide detailed technical information to researchers interested in Transformer-based anomaly detection tasks.

Research and application of Transformer based anomaly detection model: A literature review

TL;DR

Abstract

Paper Structure (50 sections, 51 equations, 12 figures, 4 tables)

This paper contains 50 sections, 51 equations, 12 figures, 4 tables.

Introduction
Concepts of anomaly detection
Relationship of different training methods
Supervised learning
Semi-supervised learning
Unsupervised learning
Self-supervised learning
Weak-supervised learning
Research of Transformer based anomaly detection
Anomaly detection based on Vanilla Transformer
Anomaly detection based on ViT
Anomaly detection based on Data-efficient image Transformer (DeiT)
Anomaly detection based on TabTransformer
Anomaly detection based on Swin-Transformer
Anomaly detection based on Informer
...and 35 more sections

Figures (12)

Figure 1: The relationship diagram of supervised, semi-supervised, unsupervised, self-supervised, and weak-supervised learning
Figure 2: Relationship between different Transformer variants
Figure 3: The structure of DeiT model
Figure 4: The structure of TabTransformer (NOTE: The figure is from the paper RF31)
Figure 5: The structure of Swin-Transformer(NOTE: The figure is from the paper RF79)
...and 7 more figures

Research and application of Transformer based anomaly detection model: A literature review

TL;DR

Abstract

Research and application of Transformer based anomaly detection model: A literature review

Authors

TL;DR

Abstract

Table of Contents

Figures (12)