Table of Contents
Fetching ...

CatBack: Universal Backdoor Attacks on Tabular Data via Categorical Encoding

Behrad Tajalli, Stefanos Koffas, Stjepan Picek

TL;DR

CatBack introduces a universal backdoor for tabular data by encoding categorical features into real-valued representations, enabling a gradient-based trigger to affect all features while preserving clean-data performance. It constructs a universal perturbation, optimizes it via an elastic-net objective, poisons a fraction of the training data, and reverts the data for standard training, achieving near-perfect attack success across five datasets and four model families. The approach transfers to black-box settings and real-world cloud systems like Google Vertex AI and remains robust against prominent defenses, exposing a critical security gap in tabular ML pipelines. These findings motivate tabular-specific defenses and evaluation metrics to secure high-stakes domains such as finance and healthcare.

Abstract

Backdoor attacks in machine learning have drawn significant attention for their potential to compromise models stealthily, yet most research has focused on homogeneous data such as images. In this work, we propose a novel backdoor attack on tabular data, which is particularly challenging due to the presence of both numerical and categorical features. Our key idea is a novel technique to convert categorical values into floating-point representations. This approach preserves enough information to maintain clean-model accuracy compared to traditional methods like one-hot or ordinal encoding. By doing this, we create a gradient-based universal perturbation that applies to all features, including categorical ones. We evaluate our method on five datasets and four popular models. Our results show up to a 100% attack success rate in both white-box and black-box settings (including real-world applications like Vertex AI), revealing a severe vulnerability for tabular data. Our method is shown to surpass the previous works like Tabdoor in terms of performance, while remaining stealthy against state-of-the-art defense mechanisms. We evaluate our attack against Spectral Signatures, Neural Cleanse, Beatrix, and Fine-Pruning, all of which fail to defend successfully against it. We also verify that our attack successfully bypasses popular outlier detection mechanisms.

CatBack: Universal Backdoor Attacks on Tabular Data via Categorical Encoding

TL;DR

CatBack introduces a universal backdoor for tabular data by encoding categorical features into real-valued representations, enabling a gradient-based trigger to affect all features while preserving clean-data performance. It constructs a universal perturbation, optimizes it via an elastic-net objective, poisons a fraction of the training data, and reverts the data for standard training, achieving near-perfect attack success across five datasets and four model families. The approach transfers to black-box settings and real-world cloud systems like Google Vertex AI and remains robust against prominent defenses, exposing a critical security gap in tabular ML pipelines. These findings motivate tabular-specific defenses and evaluation metrics to secure high-stakes domains such as finance and healthcare.

Abstract

Backdoor attacks in machine learning have drawn significant attention for their potential to compromise models stealthily, yet most research has focused on homogeneous data such as images. In this work, we propose a novel backdoor attack on tabular data, which is particularly challenging due to the presence of both numerical and categorical features. Our key idea is a novel technique to convert categorical values into floating-point representations. This approach preserves enough information to maintain clean-model accuracy compared to traditional methods like one-hot or ordinal encoding. By doing this, we create a gradient-based universal perturbation that applies to all features, including categorical ones. We evaluate our method on five datasets and four popular models. Our results show up to a 100% attack success rate in both white-box and black-box settings (including real-world applications like Vertex AI), revealing a severe vulnerability for tabular data. Our method is shown to surpass the previous works like Tabdoor in terms of performance, while remaining stealthy against state-of-the-art defense mechanisms. We evaluate our attack against Spectral Signatures, Neural Cleanse, Beatrix, and Fine-Pruning, all of which fail to defend successfully against it. We also verify that our attack successfully bypasses popular outlier detection mechanisms.

Paper Structure

This paper contains 57 sections, 33 equations, 9 figures, 16 tables, 1 algorithm.

Figures (9)

  • Figure 1: CatBack's schematic. Initially, the attacker (using our encoding) trains a model $F$ on the transformed dataset. Then, the attacker selects samples close to the decision boundary of the target class $t$. Starting from a trigger randomly initialized with the appropriate values (retrieved from a normal distribution), the attacker poisons a fraction of the data of the used dataset. Finally, the attacker reverts the dataset to its correct state so users can use it to train their own models.
  • Figure 2: Spectral Signatures: target class 1, Model: FTT, Dataset: BM, $\epsilon = 0.05$.
  • Figure 3: Spectral Signatures: target class 1, Model: FTT, Dataset: CovType, $\epsilon = 0.05$.
  • Figure 4: Race and Relationship values before and after poisoning in the ACI dataset.
  • Figure 5: CatBack's ASR and CDA vs. the poisoning rate for different $\mu$ using the ACI dataset.
  • ...and 4 more figures