LLM as an Algorithmist: Enhancing Anomaly Detectors via Programmatic Synthesis

Hangting Ye; Jinmeng Li; He Zhao; Mingchen Zhuge; Dandan Guo; Yi Chang; Hongyuan Zha

LLM as an Algorithmist: Enhancing Anomaly Detectors via Programmatic Synthesis

Hangting Ye, Jinmeng Li, He Zhao, Mingchen Zhuge, Dandan Guo, Yi Chang, Hongyuan Zha

TL;DR

This work addresses the brittleness and privacy challenges of applying LLMs to tabular anomaly detection by reframing the LLM as an algorithmist. It introduces LLM-DAS, a two-stage framework where an LLM generates detector-specific, data-agnostic Python code to synthesize hard anomalies, which are then instantiated on a dataset to augment training and convert one-class detection into a more discriminative two-class task. Key contributions include detector-aware code generation, dataset-wise anomaly instantiation, and extensive experiments across 36 TAD benchmarks showing robust improvements over PCA, IForest, OCSVM, ECOD, and DRL, with analyses validating the importance of detector-awareness and borderline-sample strategies. The approach preserves data privacy, is highly reusable across datasets, and offers a scalable, plug-and-play mechanism to strengthen existing detectors without LLM fine-tuning, thereby enhancing robustness in real-world, heterogeneous tabular settings.

Abstract

Existing anomaly detection (AD) methods for tabular data usually rely on some assumptions about anomaly patterns, leading to inconsistent performance in real-world scenarios. While Large Language Models (LLMs) show remarkable reasoning capabilities, their direct application to tabular AD is impeded by fundamental challenges, including difficulties in processing heterogeneous data and significant privacy risks. To address these limitations, we propose LLM-DAS, a novel framework that repositions the LLM from a ``data processor'' to an ``algorithmist''. Instead of being exposed to raw data, our framework leverages the LLM's ability to reason about algorithms. It analyzes a high-level description of a given detector to understand its intrinsic weaknesses and then generates detector-specific, data-agnostic Python code to synthesize ``hard-to-detect'' anomalies that exploit these vulnerabilities. This generated synthesis program, which is reusable across diverse datasets, is then instantiated to augment training data, systematically enhancing the detector's robustness by transforming the problem into a more discriminative two-class classification task. Extensive experiments on 36 TAD benchmarks show that LLM-DAS consistently boosts the performance of mainstream detectors. By bridging LLM reasoning with classic AD algorithms via programmatic synthesis, LLM-DAS offers a scalable, effective, and privacy-preserving approach to patching the logical blind spots of existing detectors.

LLM as an Algorithmist: Enhancing Anomaly Detectors via Programmatic Synthesis

TL;DR

Abstract

LLM as an Algorithmist: Enhancing Anomaly Detectors via Programmatic Synthesis

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (5)