Table of Contents
Fetching ...

LLM as an Algorithmist: Enhancing Anomaly Detectors via Programmatic Synthesis

Hangting Ye, Jinmeng Li, He Zhao, Mingchen Zhuge, Dandan Guo, Yi Chang, Hongyuan Zha

TL;DR

This work addresses the brittleness and privacy challenges of applying LLMs to tabular anomaly detection by reframing the LLM as an algorithmist. It introduces LLM-DAS, a two-stage framework where an LLM generates detector-specific, data-agnostic Python code to synthesize hard anomalies, which are then instantiated on a dataset to augment training and convert one-class detection into a more discriminative two-class task. Key contributions include detector-aware code generation, dataset-wise anomaly instantiation, and extensive experiments across 36 TAD benchmarks showing robust improvements over PCA, IForest, OCSVM, ECOD, and DRL, with analyses validating the importance of detector-awareness and borderline-sample strategies. The approach preserves data privacy, is highly reusable across datasets, and offers a scalable, plug-and-play mechanism to strengthen existing detectors without LLM fine-tuning, thereby enhancing robustness in real-world, heterogeneous tabular settings.

Abstract

Existing anomaly detection (AD) methods for tabular data usually rely on some assumptions about anomaly patterns, leading to inconsistent performance in real-world scenarios. While Large Language Models (LLMs) show remarkable reasoning capabilities, their direct application to tabular AD is impeded by fundamental challenges, including difficulties in processing heterogeneous data and significant privacy risks. To address these limitations, we propose LLM-DAS, a novel framework that repositions the LLM from a ``data processor'' to an ``algorithmist''. Instead of being exposed to raw data, our framework leverages the LLM's ability to reason about algorithms. It analyzes a high-level description of a given detector to understand its intrinsic weaknesses and then generates detector-specific, data-agnostic Python code to synthesize ``hard-to-detect'' anomalies that exploit these vulnerabilities. This generated synthesis program, which is reusable across diverse datasets, is then instantiated to augment training data, systematically enhancing the detector's robustness by transforming the problem into a more discriminative two-class classification task. Extensive experiments on 36 TAD benchmarks show that LLM-DAS consistently boosts the performance of mainstream detectors. By bridging LLM reasoning with classic AD algorithms via programmatic synthesis, LLM-DAS offers a scalable, effective, and privacy-preserving approach to patching the logical blind spots of existing detectors.

LLM as an Algorithmist: Enhancing Anomaly Detectors via Programmatic Synthesis

TL;DR

This work addresses the brittleness and privacy challenges of applying LLMs to tabular anomaly detection by reframing the LLM as an algorithmist. It introduces LLM-DAS, a two-stage framework where an LLM generates detector-specific, data-agnostic Python code to synthesize hard anomalies, which are then instantiated on a dataset to augment training and convert one-class detection into a more discriminative two-class task. Key contributions include detector-aware code generation, dataset-wise anomaly instantiation, and extensive experiments across 36 TAD benchmarks showing robust improvements over PCA, IForest, OCSVM, ECOD, and DRL, with analyses validating the importance of detector-awareness and borderline-sample strategies. The approach preserves data privacy, is highly reusable across datasets, and offers a scalable, plug-and-play mechanism to strengthen existing detectors without LLM fine-tuning, thereby enhancing robustness in real-world, heterogeneous tabular settings.

Abstract

Existing anomaly detection (AD) methods for tabular data usually rely on some assumptions about anomaly patterns, leading to inconsistent performance in real-world scenarios. While Large Language Models (LLMs) show remarkable reasoning capabilities, their direct application to tabular AD is impeded by fundamental challenges, including difficulties in processing heterogeneous data and significant privacy risks. To address these limitations, we propose LLM-DAS, a novel framework that repositions the LLM from a ``data processor'' to an ``algorithmist''. Instead of being exposed to raw data, our framework leverages the LLM's ability to reason about algorithms. It analyzes a high-level description of a given detector to understand its intrinsic weaknesses and then generates detector-specific, data-agnostic Python code to synthesize ``hard-to-detect'' anomalies that exploit these vulnerabilities. This generated synthesis program, which is reusable across diverse datasets, is then instantiated to augment training data, systematically enhancing the detector's robustness by transforming the problem into a more discriminative two-class classification task. Extensive experiments on 36 TAD benchmarks show that LLM-DAS consistently boosts the performance of mainstream detectors. By bridging LLM reasoning with classic AD algorithms via programmatic synthesis, LLM-DAS offers a scalable, effective, and privacy-preserving approach to patching the logical blind spots of existing detectors.

Paper Structure

This paper contains 14 sections, 7 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Comparison between traditional TAD methods and our LLM-DAS. By synthesizing "hard" anomalies, LLM-DAS effectively transforms the original one-class classification problem into a more discriminative two-class classification problem, thereby strengthening the detector and yielding a more nuanced decision boundary.
  • Figure 2: The LLM-DAS framework consists of two phases: (1) a data-agnostic reasoning phase, where an LLM generates a reusable anomaly synthesis code for one type of detector, and (2) a data-specific phase, where this code is applied to generate challenging anomalies for detector enhancement.
  • Figure 3: Comparison of all models’ performance across different datasets (in AUC-PR). The red triangles represent the average value. AnoLLM has two versions (135M and 360M parameters).
  • Figure 4: Performance comparison of anomaly synthesis methods (left) and LLM-DAS designs across different datasets (right) in terms of AUC-PR.
  • Figure 5: Visualization of synthetic hard anomalies and score distributions on Thyroid test dataset. (a) T-SNE plots of normal, real anomaly, and LLM-DAS–generated anomaly samples. (b) Kernel density estimation (KDE) of anomaly scores from the source detector $f_t$. (c) KDE of anomaly scores from the enhanced detector $F_t$.