Table of Contents
Fetching ...

Boosting Anomaly Detection Using Unsupervised Diverse Test-Time Augmentation

Seffi Cohen, Niv Goldshlager, Lior Rokach, Bracha Shapira

TL;DR

This work addresses the challenge of anomaly detection in tabular data without labeled anomalies by introducing TTAD, a test-time augmentation framework. TTAD consists of a neighbor-based data selector that uses a learned distance metric from a Siamese network and two augmentation producers (k-Means centroids and SMOTE) to generate diverse, in-distribution test augmentations; the augmented samples are scored by a detector and the results are aggregated. The approach yields consistent AUC improvements over baselines across eight ODDS datasets, with the learned metric and k-Means augmentation often providing the strongest gains, while Gaussian-noise TTA can be detrimental. Practically, TTAD offers a training-free, efficient enhancement to tabular anomaly detection that leverages unsupervised learning to improve robustness and accuracy.

Abstract

Anomaly detection is a well-known task that involves the identification of abnormal events that occur relatively infrequently. Methods for improving anomaly detection performance have been widely studied. However, no studies utilizing test-time augmentation (TTA) for anomaly detection in tabular data have been performed. TTA involves aggregating the predictions of several synthetic versions of a given test sample; TTA produces different points of view for a specific test instance and might decrease its prediction bias. We propose the Test-Time Augmentation for anomaly Detection (TTAD) technique, a TTA-based method aimed at improving anomaly detection performance. TTAD augments a test instance based on its nearest neighbors; various methods, including the k-Means centroid and SMOTE methods, are used to produce the augmentations. Our technique utilizes a Siamese network to learn an advanced distance metric when retrieving a test instance's neighbors. Our experiments show that the anomaly detector that uses our TTA technique achieved significantly higher AUC results on all datasets evaluated.

Boosting Anomaly Detection Using Unsupervised Diverse Test-Time Augmentation

TL;DR

This work addresses the challenge of anomaly detection in tabular data without labeled anomalies by introducing TTAD, a test-time augmentation framework. TTAD consists of a neighbor-based data selector that uses a learned distance metric from a Siamese network and two augmentation producers (k-Means centroids and SMOTE) to generate diverse, in-distribution test augmentations; the augmented samples are scored by a detector and the results are aggregated. The approach yields consistent AUC improvements over baselines across eight ODDS datasets, with the learned metric and k-Means augmentation often providing the strongest gains, while Gaussian-noise TTA can be detrimental. Practically, TTAD offers a training-free, efficient enhancement to tabular anomaly detection that leverages unsupervised learning to improve robustness and accuracy.

Abstract

Anomaly detection is a well-known task that involves the identification of abnormal events that occur relatively infrequently. Methods for improving anomaly detection performance have been widely studied. However, no studies utilizing test-time augmentation (TTA) for anomaly detection in tabular data have been performed. TTA involves aggregating the predictions of several synthetic versions of a given test sample; TTA produces different points of view for a specific test instance and might decrease its prediction bias. We propose the Test-Time Augmentation for anomaly Detection (TTAD) technique, a TTA-based method aimed at improving anomaly detection performance. TTAD augments a test instance based on its nearest neighbors; various methods, including the k-Means centroid and SMOTE methods, are used to produce the augmentations. Our technique utilizes a Siamese network to learn an advanced distance metric when retrieving a test instance's neighbors. Our experiments show that the anomaly detector that uses our TTA technique achieved significantly higher AUC results on all datasets evaluated.

Paper Structure

This paper contains 26 sections, 5 equations, 5 figures, 4 tables, 1 algorithm.

Figures (5)

  • Figure 1: An overview of the TTAD technique. a - The test set. b - An isolation forest is used to pseudo-label the test set. c - A custom distance metric for the nearest neighbor data selector is learned using a Siamese network. d - TTAD is applied for each instance in the test set. e - The data selector component selects a subset of instances for each instance in the test set to serve as a training set for generating augmented instances. f - The augmentation producer component generates diverse augmented instances. g - The instances are scored using an anomaly detector, and the anomaly score is aggregated by the mean.
  • Figure 2: A subset of comparable data is selected for each test instance by the NN model in the data selector component.
  • Figure 3: The architecture of the Siamese network: $x_i$ and $x_j$ are the input samples. Their embeddings are obtained by two identical neural networks. The last layer outputs the distance between the input pair embeddings.
  • Figure 4: Generation of diverse augmentations using the k-Means model’s centroids on data selected in component 1.
  • Figure 5: Producing synthetic samples with SMOTE by randomly interpolating new points from existing instances.