Table of Contents
Fetching ...

Deep Anomaly Detection in Text

Andrei Manolache

TL;DR

The thesis addresses anomaly detection in text by proposing DATE, a transformer-based end-to-end method that leverages self-supervised pretext tasks to produce a robust anomaly score. DATE uses two pretext tasks, Replaced Mask Detection and Replaced Token Detection, within a generator–discriminator framework inspired by ELECTRA, and introduces a computationally efficient Pseudo Label score for inference. Across 20Newsgroups and AG News, DATE achieves state-of-the-art semi-supervised and unsupervised results, outperforming classical baselines (OC-SVM, CVDD, SVDD) and deep competitors (E3 Outlier variants). The work demonstrates strong text AD performance, offers per-token anomaly explanations, and suggests broad future directions in self-supervised objectives and contrastive learning for textual anomaly detection. These contributions advance practical, scalable anomaly detection for NLP, with potential extensions to authorship and stylistic analysis.

Abstract

Deep anomaly detection methods have become increasingly popular in recent years, with methods like Stacked Autoencoders, Variational Autoencoders, and Generative Adversarial Networks greatly improving the state-of-the-art. Other methods rely on augmenting classical models (such as the One-Class Support Vector Machine), by learning an appropriate kernel function using Neural Networks. Recent developments in representation learning by self-supervision are proving to be very beneficial in the context of anomaly detection. Inspired by the advancements in anomaly detection using self-supervised learning in the field of computer vision, this thesis aims to develop a method for detecting anomalies by exploiting pretext tasks tailored for text corpora. This approach greatly improves the state-of-the-art on two datasets, 20Newsgroups, and AG News, for both semi-supervised and unsupervised anomaly detection, thus proving the potential for self-supervised anomaly detectors in the field of natural language processing.

Deep Anomaly Detection in Text

TL;DR

The thesis addresses anomaly detection in text by proposing DATE, a transformer-based end-to-end method that leverages self-supervised pretext tasks to produce a robust anomaly score. DATE uses two pretext tasks, Replaced Mask Detection and Replaced Token Detection, within a generator–discriminator framework inspired by ELECTRA, and introduces a computationally efficient Pseudo Label score for inference. Across 20Newsgroups and AG News, DATE achieves state-of-the-art semi-supervised and unsupervised results, outperforming classical baselines (OC-SVM, CVDD, SVDD) and deep competitors (E3 Outlier variants). The work demonstrates strong text AD performance, offers per-token anomaly explanations, and suggests broad future directions in self-supervised objectives and contrastive learning for textual anomaly detection. These contributions advance practical, scalable anomaly detection for NLP, with potential extensions to authorship and stylistic analysis.

Abstract

Deep anomaly detection methods have become increasingly popular in recent years, with methods like Stacked Autoencoders, Variational Autoencoders, and Generative Adversarial Networks greatly improving the state-of-the-art. Other methods rely on augmenting classical models (such as the One-Class Support Vector Machine), by learning an appropriate kernel function using Neural Networks. Recent developments in representation learning by self-supervision are proving to be very beneficial in the context of anomaly detection. Inspired by the advancements in anomaly detection using self-supervised learning in the field of computer vision, this thesis aims to develop a method for detecting anomalies by exploiting pretext tasks tailored for text corpora. This approach greatly improves the state-of-the-art on two datasets, 20Newsgroups, and AG News, for both semi-supervised and unsupervised anomaly detection, thus proving the potential for self-supervised anomaly detectors in the field of natural language processing.
Paper Structure (44 sections, 27 equations, 20 figures, 13 tables)

This paper contains 44 sections, 27 equations, 20 figures, 13 tables.

Figures (20)

  • Figure 1: The anomaly spectrum. Noisy data can be regarded as "weakly" anomalous. (figure adapted from: Aggarwal2013)
  • Figure 2: The AlexNet CNN architecture. (figure from: alexnet)
  • Figure 3: The training and fine-tuning procedures for the BERT Transformer-based model. More details about Transformer models will be provided in section \ref{['sec:transformer']}. The BERT architecture will be detailed in subsection \ref{['sub:bert']}. (figure from: bert)
  • Figure 4: Example of general pretext tasks for self-supervised learning. In natural language processing we could try to predict the next token, or some masked tokens, from a sequence of tokens. In computer vision we can build a model that reconstruct the missing parts of a corrupted image. (figure from: lecun_ssl_fb)
  • Figure 5: The CBOW and Skip-Gram models. CBOW is predicting the word with respect to the context, while Skip-Gram is predicting the context with respect to the word. (figure from: mikolov_cbow_skipgram)
  • ...and 15 more figures