TabularBench: Benchmarking Adversarial Robustness for Tabular Deep Learning in Real-world Use-cases

Thibault Simonetto; Salah Ghamizi; Maxime Cordy

TabularBench: Benchmarking Adversarial Robustness for Tabular Deep Learning in Real-world Use-cases

Thibault Simonetto, Salah Ghamizi, Maxime Cordy

TL;DR

TabularBench introduces the first comprehensive benchmark for adversarial robustness of deep tabular learning models under domain constraints, addressing a critical gap for real-world deployment. The framework uses Constrained Adaptive Attack (CAA) to evaluate five architectures (e.g., TabTransformer, TabNet, RLN, STG, VIME) across five datasets (CTU, LCLD, Malware, URL, WIDS) and combines adversarial training with diverse data augmentations via a modular API. Key contributions include a public leaderboard with 200+ evaluations, a Dataset Zoo of real and synthetically constrained data, a Model Zoo of pretrained robust models, and empirical insights that guide architecture choice and defense design under realistic constraints. This benchmark is poised to accelerate robust tabular DL research and support reproducible, practical deployment in finance, healthcare, and security domains, including robustness under constrained perturbations measured with $L_2$ budgets.

Abstract

While adversarial robustness in computer vision is a mature research field, fewer researchers have tackled the evasion attacks against tabular deep learning, and even fewer investigated robustification mechanisms and reliable defenses. We hypothesize that this lag in the research on tabular adversarial attacks is in part due to the lack of standardized benchmarks. To fill this gap, we propose TabularBench, the first comprehensive benchmark of robustness of tabular deep learning classification models. We evaluated adversarial robustness with CAA, an ensemble of gradient and search attacks which was recently demonstrated as the most effective attack against a tabular model. In addition to our open benchmark (https://github.com/serval-uni-lu/tabularbench) where we welcome submissions of new models and defenses, we implement 7 robustification mechanisms inspired by state-of-the-art defenses in computer vision and propose the largest benchmark of robust tabular deep learning over 200 models across five critical scenarios in finance, healthcare and security. We curated real datasets for each use case, augmented with hundreds of thousands of realistic synthetic inputs, and trained and assessed our models with and without data augmentations. We open-source our library that provides API access to all our pre-trained robust tabular models, and the largest datasets of real and synthetic tabular inputs. Finally, we analyze the impact of various defenses on the robustness and provide actionable insights to design new defenses and robustification mechanisms.

TabularBench: Benchmarking Adversarial Robustness for Tabular Deep Learning in Real-world Use-cases

TL;DR

budgets.

Abstract

Paper Structure (57 sections, 1 equation, 11 figures, 8 tables)

This paper contains 57 sections, 1 equation, 11 figures, 8 tables.

Introduction
Contributions.
Background
TabularBench: Adversarial Robustness Benchmark for Tabular Data
Tasks
Architectures
Data Augmentation
TabularBench API
Empirical Findings
Without Data Augmentations
Impact of Data Augmentations
Impact of Adversarial Training
Impact of Architecture
Impact of Attack Budgets
Limitations
...and 42 more sections

Figures (11)

Figure 1: The main challenges for adversarial attacks in Tabular Machine Learning: When an adversary perturbs some features (red), it may not be aware of the new features that are computed internally and added (blue), or the relationships between features (green). If the monitoring system detects a constraint violation, the input is quarantined and a rejection (1) is returned.
Figure 2: Summary of our main experiments; Y-axis: Robust Accuracy, X-axis ID accuracy
Figure 3: Robust performance while considering domain constraints (ADV+CTR: Y-axis) and without (ADV: X-axis) on all our use cases confirms the relevance of studying constrained-aware attacks.
Figure 4: Impact of attack budget on the robust accuracy for URL dataset.
Figure 5: Impact of attack budget on the robust accuracy for LCLD dataset.
...and 6 more figures

TabularBench: Benchmarking Adversarial Robustness for Tabular Deep Learning in Real-world Use-cases

TL;DR

Abstract

TabularBench: Benchmarking Adversarial Robustness for Tabular Deep Learning in Real-world Use-cases

Authors

TL;DR

Abstract

Table of Contents

Figures (11)