Table of Contents
Fetching ...

ReTabAD: A Benchmark for Restoring Semantic Context in Tabular Anomaly Detection

Sanghyu Yoon, Dongmin Kim, Suhee Yoon, Ye Seul Sim, Seungdong Yoa, Hye-Seung Cho, Soonyoung Lee, Hankook Lee, Woohyung Lim

TL;DR

ReTabAD tackles the lack of semantic context in tabular anomaly detection by introducing 20 richly annotated datasets with structured textual metadata and supporting a broad set of algorithms, including a zero-shot LLM baseline. The benchmark enables context-grounded anomaly detection by preserving raw numerical values, restoring categorical text, and providing dataset-, feature-, and label-level descriptions, facilitating p(x|𝓜) reasoning over semantic context. Empirically, semantic metadata improves detection performance and interpretability, with a +7.6 percentage point AUROC gain on average when full metadata is used, and zero-shot LLMs approaching state-of-the-art trained detectors by leveraging context. The work also analyzes reasoning quality and feature-attribution alignment, showing that domain-aware metadata enables richer, more credible explanations. Overall, ReTabAD offers a unified platform to study context-aware anomaly detection in tabular data and paves the way for semantic-guided detection and interpretation in real-world systems.

Abstract

In tabular anomaly detection (AD), textual semantics often carry critical signals, as the definition of an anomaly is closely tied to domain-specific context. However, existing benchmarks provide only raw data points without semantic context, overlooking rich textual metadata such as feature descriptions and domain knowledge that experts rely on in practice. This limitation restricts research flexibility and prevents models from fully leveraging domain knowledge for detection. ReTabAD addresses this gap by restoring textual semantics to enable context-aware tabular AD research. We provide (1) 20 carefully curated tabular datasets enriched with structured textual metadata, together with implementations of state-of-the-art AD algorithms including classical, deep learning, and LLM-based approaches, and (2) a zero-shot LLM framework that leverages semantic context without task-specific training, establishing a strong baseline for future research. Furthermore, this work provides insights into the role and utility of textual metadata in AD through experiments and analysis. Results show that semantic context improves detection performance and enhances interpretability by supporting domain-aware reasoning. These findings establish ReTabAD as a benchmark for systematic exploration of context-aware AD.

ReTabAD: A Benchmark for Restoring Semantic Context in Tabular Anomaly Detection

TL;DR

ReTabAD tackles the lack of semantic context in tabular anomaly detection by introducing 20 richly annotated datasets with structured textual metadata and supporting a broad set of algorithms, including a zero-shot LLM baseline. The benchmark enables context-grounded anomaly detection by preserving raw numerical values, restoring categorical text, and providing dataset-, feature-, and label-level descriptions, facilitating p(x|𝓜) reasoning over semantic context. Empirically, semantic metadata improves detection performance and interpretability, with a +7.6 percentage point AUROC gain on average when full metadata is used, and zero-shot LLMs approaching state-of-the-art trained detectors by leveraging context. The work also analyzes reasoning quality and feature-attribution alignment, showing that domain-aware metadata enables richer, more credible explanations. Overall, ReTabAD offers a unified platform to study context-aware anomaly detection in tabular data and paves the way for semantic-guided detection and interpretation in real-world systems.

Abstract

In tabular anomaly detection (AD), textual semantics often carry critical signals, as the definition of an anomaly is closely tied to domain-specific context. However, existing benchmarks provide only raw data points without semantic context, overlooking rich textual metadata such as feature descriptions and domain knowledge that experts rely on in practice. This limitation restricts research flexibility and prevents models from fully leveraging domain knowledge for detection. ReTabAD addresses this gap by restoring textual semantics to enable context-aware tabular AD research. We provide (1) 20 carefully curated tabular datasets enriched with structured textual metadata, together with implementations of state-of-the-art AD algorithms including classical, deep learning, and LLM-based approaches, and (2) a zero-shot LLM framework that leverages semantic context without task-specific training, establishing a strong baseline for future research. Furthermore, this work provides insights into the role and utility of textual metadata in AD through experiments and analysis. Results show that semantic context improves detection performance and enhances interpretability by supporting domain-aware reasoning. These findings establish ReTabAD as a benchmark for systematic exploration of context-aware AD.

Paper Structure

This paper contains 30 sections, 3 equations, 8 figures, 15 tables.

Figures (8)

  • Figure 1: Overall Data Collection and Annotation Process of ReTabAD. We preserve semantically rich tabular values and augment them with structured textual metadata, enabling rigorous evaluation of context-aware AD.
  • Figure 2: Zero-shot LLM Baseline Overivew. We design prompts to evaluate the role of semantic metadata by incorporating domain knowledge ($\mathcal{C}_{domain}$), feature descriptions ($\mathcal{C}_{feature}$), and normal statistics ($\mathcal{C}_{statistic}$). The LLM generates outputs including anomaly scores, key features, and anomaly reasoning.
  • Figure 3: Quantitative Evaluation of LLM Reasoning via Performance Gain. Results are reported on the representative cirrhosis dataset.
  • Figure 4: Reasoning Text Examples on the cirrhosis dataset. Left: JSON example. Right: Comparison of Type A (numeric deviation only) and Type D (domain grounded explanation).
  • Figure 5: Type A Prompt Examples
  • ...and 3 more figures