Table of Contents
Fetching ...

Crafting Imperceptible On-Manifold Adversarial Attacks for Tabular Data

Zhipeng He, Alexander Stevens, Chun Ouyang, Johannes De Smedt, Alistair Barros, Catarina Moreira

TL;DR

This paper tackles the challenge of creating imperceptible adversarial examples for tabular data by performing perturbations in a mixed-input latent space learned by a variational autoencoder. A mixed-input VAE with a classification head enables on-manifold perturbations that preserve the data distribution while deceiving classifiers, and the authors introduce the In-Distribution Success Rate metric to jointly assess attack effectiveness and distributional alignment. Across six datasets and three model architectures, the proposed method achieves superior reconstruction fidelity and higher IDSR compared with traditional input-space attacks and image-domain VAE baselines, though performance strongly depends on latent-space reconstruction quality and data availability. The work highlights the importance of manifold-aligned perturbations for realistic adversarial examples in tabular domains and provides insights into hyperparameter sensitivity, sparsity control, and generative-model choices with implications for robustness evaluation and future generative-model-based attacks in mixed-type data.

Abstract

Adversarial attacks on tabular data present unique challenges due to the heterogeneous nature of mixed categorical and numerical features. Unlike images where pixel perturbations maintain visual similarity, tabular data lacks intuitive similarity metrics, making it difficult to define imperceptible modifications. Additionally, traditional gradient-based methods prioritise $\ell_p$-norm constraints, often producing adversarial examples that deviate from the original data distributions. To address this, we propose a latent-space perturbation framework using a mixed-input Variational Autoencoder (VAE) to generate statistically consistent adversarial examples. The proposed VAE integrates categorical embeddings and numerical features into a unified latent manifold, enabling perturbations that preserve statistical consistency. We introduce In-Distribution Success Rate (IDSR) to jointly evaluate attack effectiveness and distributional alignment. Evaluation across six publicly available datasets and three model architectures demonstrates that our method achieves substantially lower outlier rates and more consistent performance compared to traditional input-space attacks and other VAE-based methods adapted from image domain approaches, achieving substantially lower outlier rates and higher IDSR across six datasets and three model architectures. Our comprehensive analyses of hyperparameter sensitivity, sparsity control, and generative architecture demonstrate that the effectiveness of VAE-based attacks depends strongly on reconstruction quality and the availability of sufficient training data. When these conditions are met, the proposed framework achieves superior practical utility and stability compared with input-space methods. This work underscores the importance of maintaining on-manifold perturbations for generating realistic and robust adversarial examples in tabular domains.

Crafting Imperceptible On-Manifold Adversarial Attacks for Tabular Data

TL;DR

This paper tackles the challenge of creating imperceptible adversarial examples for tabular data by performing perturbations in a mixed-input latent space learned by a variational autoencoder. A mixed-input VAE with a classification head enables on-manifold perturbations that preserve the data distribution while deceiving classifiers, and the authors introduce the In-Distribution Success Rate metric to jointly assess attack effectiveness and distributional alignment. Across six datasets and three model architectures, the proposed method achieves superior reconstruction fidelity and higher IDSR compared with traditional input-space attacks and image-domain VAE baselines, though performance strongly depends on latent-space reconstruction quality and data availability. The work highlights the importance of manifold-aligned perturbations for realistic adversarial examples in tabular domains and provides insights into hyperparameter sensitivity, sparsity control, and generative-model choices with implications for robustness evaluation and future generative-model-based attacks in mixed-type data.

Abstract

Adversarial attacks on tabular data present unique challenges due to the heterogeneous nature of mixed categorical and numerical features. Unlike images where pixel perturbations maintain visual similarity, tabular data lacks intuitive similarity metrics, making it difficult to define imperceptible modifications. Additionally, traditional gradient-based methods prioritise -norm constraints, often producing adversarial examples that deviate from the original data distributions. To address this, we propose a latent-space perturbation framework using a mixed-input Variational Autoencoder (VAE) to generate statistically consistent adversarial examples. The proposed VAE integrates categorical embeddings and numerical features into a unified latent manifold, enabling perturbations that preserve statistical consistency. We introduce In-Distribution Success Rate (IDSR) to jointly evaluate attack effectiveness and distributional alignment. Evaluation across six publicly available datasets and three model architectures demonstrates that our method achieves substantially lower outlier rates and more consistent performance compared to traditional input-space attacks and other VAE-based methods adapted from image domain approaches, achieving substantially lower outlier rates and higher IDSR across six datasets and three model architectures. Our comprehensive analyses of hyperparameter sensitivity, sparsity control, and generative architecture demonstrate that the effectiveness of VAE-based attacks depends strongly on reconstruction quality and the availability of sufficient training data. When these conditions are met, the proposed framework achieves superior practical utility and stability compared with input-space methods. This work underscores the importance of maintaining on-manifold perturbations for generating realistic and robust adversarial examples in tabular domains.

Paper Structure

This paper contains 58 sections, 5 equations, 6 figures, 8 tables, 1 algorithm.

Figures (6)

  • Figure 1: (a) Illustration of applying the Fast Gradient Sign Method (FGSM) to tabular data, demonstrating the perturbation of categorical and numerical features. (b) A scatter plot visualises that the adversarial examples (green) by FGSM deviate from the original input distribution (blue) on Adult dataset.
  • Figure 2: Overall framework of the proposed VAE-based adversarial example generation. (a) VAE Training: Input features $x$ (categorical and numerical) are pre-processed via embedding and normalisation before being encoded into latent variables $(\mu, \sigma)$. The decoder reconstructs both numerical and categorical features under the joint VAE loss. An auxiliary classification head $h_\omega(z)$, shown as a grey node, is employed during training to promote a discriminative and well-structured latent space; it is not used during adversarial generation. (b) Adversarial Example Generation: After training, the encoder and decoder are frozen. For each input, a latent representation $z$ is obtained and iteratively perturbed by an instance-specific vector $\delta$ optimised via gradient descent on the C&W-style attack objective. The decoder reconstructs the perturbed latent vector into an adversarial example $\tilde{x}=p_\psi(z+\delta^*)$, which remains on the data manifold while manipulating the classifier's logits $Z(\tilde{x})$ to cross the decision boundary.
  • Figure 3: The t-SNE visualisation of latent codes coloured by class for Phishing dataset. Left: VAE without $\mathcal{L}_{\text{cls}}$ shows mixed clusters. Right: Proposed VAE with $\mathcal{L}_{\text{cls}}$ exhibits class-separable structure, explaining the lower $\delta_{\text{acc}}$.
  • Figure 4: Latent space visualisation (t-SNE) of original inputs (blue) and adversarial examples (green) generated by VAE attack. Near-perfect overlap confirms distributional consistency, with perturbations constrained to the data manifold.
  • Figure 5: Success rate and perturbation magnitude as functions of learning rate ($\eta$) and $\lambda$ on the Adult dataset (MLP model). Higher $\lambda$ and $\eta$ increase success rates (Figure \ref{['fig:success_rate']}) but also amplify perturbation magnitudes (Figure \ref{['fig:l2_distance']}), with diminishing returns beyond $\eta > 0.2$. Optimal configurations balance mid-range $\lambda$ (e.g., $\lambda = 0.7$--$0.9$) with $\eta = 0.1$--$0.2$.
  • ...and 1 more figures