Anomaly Detection of Tabular Data Using LLMs

Aodong Li; Yunhan Zhao; Chen Qiu; Marius Kloft; Padhraic Smyth; Maja Rudolph; Stephan Mandt

Anomaly Detection of Tabular Data Using LLMs

Aodong Li, Yunhan Zhao, Chen Qiu, Marius Kloft, Padhraic Smyth, Maja Rudolph, Stephan Mandt

TL;DR

This paper shows that pre-trained LLMs are zero-shot batch-level anomaly detectors, without extra distribution-specific model fitting, that can discover hidden outliers in a batch of data, demonstrating their ability to identify low-density data regions.

Abstract

Large language models (LLMs) have shown their potential in long-context understanding and mathematical reasoning. In this paper, we study the problem of using LLMs to detect tabular anomalies and show that pre-trained LLMs are zero-shot batch-level anomaly detectors. That is, without extra distribution-specific model fitting, they can discover hidden outliers in a batch of data, demonstrating their ability to identify low-density data regions. For LLMs that are not well aligned with anomaly detection and frequently output factual errors, we apply simple yet effective data-generating processes to simulate synthetic batch-level anomaly detection datasets and propose an end-to-end fine-tuning strategy to bring out the potential of LLMs in detecting real anomalies. Experiments on a large anomaly detection benchmark (ODDS) showcase i) GPT-4 has on-par performance with the state-of-the-art transductive learning-based anomaly detection methods and ii) the efficacy of our synthetic dataset and fine-tuning strategy in aligning LLMs to this task.

Anomaly Detection of Tabular Data Using LLMs

TL;DR

Abstract

Anomaly Detection of Tabular Data Using LLMs

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (5)