TabularBench: Benchmarking Adversarial Robustness for Tabular Deep Learning in Real-world Use-cases
Thibault Simonetto, Salah Ghamizi, Maxime Cordy
TL;DR
TabularBench introduces the first comprehensive benchmark for adversarial robustness of deep tabular learning models under domain constraints, addressing a critical gap for real-world deployment. The framework uses Constrained Adaptive Attack (CAA) to evaluate five architectures (e.g., TabTransformer, TabNet, RLN, STG, VIME) across five datasets (CTU, LCLD, Malware, URL, WIDS) and combines adversarial training with diverse data augmentations via a modular API. Key contributions include a public leaderboard with 200+ evaluations, a Dataset Zoo of real and synthetically constrained data, a Model Zoo of pretrained robust models, and empirical insights that guide architecture choice and defense design under realistic constraints. This benchmark is poised to accelerate robust tabular DL research and support reproducible, practical deployment in finance, healthcare, and security domains, including robustness under constrained perturbations measured with $L_2$ budgets.
Abstract
While adversarial robustness in computer vision is a mature research field, fewer researchers have tackled the evasion attacks against tabular deep learning, and even fewer investigated robustification mechanisms and reliable defenses. We hypothesize that this lag in the research on tabular adversarial attacks is in part due to the lack of standardized benchmarks. To fill this gap, we propose TabularBench, the first comprehensive benchmark of robustness of tabular deep learning classification models. We evaluated adversarial robustness with CAA, an ensemble of gradient and search attacks which was recently demonstrated as the most effective attack against a tabular model. In addition to our open benchmark (https://github.com/serval-uni-lu/tabularbench) where we welcome submissions of new models and defenses, we implement 7 robustification mechanisms inspired by state-of-the-art defenses in computer vision and propose the largest benchmark of robust tabular deep learning over 200 models across five critical scenarios in finance, healthcare and security. We curated real datasets for each use case, augmented with hundreds of thousands of realistic synthetic inputs, and trained and assessed our models with and without data augmentations. We open-source our library that provides API access to all our pre-trained robust tabular models, and the largest datasets of real and synthetic tabular inputs. Finally, we analyze the impact of various defenses on the robustness and provide actionable insights to design new defenses and robustification mechanisms.
