Investigating Imperceptibility of Adversarial Attacks on Tabular Data: An Empirical Analysis

Zhipeng He; Chun Ouyang; Laith Alzubaidi; Alistair Barros; Catarina Moreira

Investigating Imperceptibility of Adversarial Attacks on Tabular Data: An Empirical Analysis

Zhipeng He, Chun Ouyang, Laith Alzubaidi, Alistair Barros, Catarina Moreira

TL;DR

A set of key properties and corresponding metrics designed to comprehensively characterise imperceptible adversarial attacks on tabular data are proposed and revealed, revealing a trade-off between the imperceptibility and effectiveness of these attacks.

Abstract

Adversarial attacks are a potential threat to machine learning models by causing incorrect predictions through imperceptible perturbations to the input data. While these attacks have been extensively studied in unstructured data like images, applying them to tabular data, poses new challenges. These challenges arise from the inherent heterogeneity and complex feature interdependencies in tabular data, which differ from the image data. To account for this distinction, it is necessary to establish tailored imperceptibility criteria specific to tabular data. However, there is currently a lack of standardised metrics for assessing the imperceptibility of adversarial attacks on tabular data. To address this gap, we propose a set of key properties and corresponding metrics designed to comprehensively characterise imperceptible adversarial attacks on tabular data. These are: proximity to the original input, sparsity of altered features, deviation from the original data distribution, sensitivity in perturbing features with narrow distribution, immutability of certain features that should remain unchanged, feasibility of specific feature values that should not go beyond valid practical ranges, and feature interdependencies capturing complex relationships between data attributes. We evaluate the imperceptibility of five adversarial attacks, including both bounded attacks and unbounded attacks, on tabular data using the proposed imperceptibility metrics. The results reveal a trade-off between the imperceptibility and effectiveness of these attacks. The study also identifies limitations in current attack algorithms, offering insights that can guide future research in the area. The findings gained from this empirical analysis provide valuable direction for enhancing the design of adversarial attack algorithms, thereby advancing adversarial machine learning on tabular data.

Investigating Imperceptibility of Adversarial Attacks on Tabular Data: An Empirical Analysis

TL;DR

Abstract

Paper Structure (54 sections, 6 equations, 8 figures, 15 tables)

This paper contains 54 sections, 6 equations, 8 figures, 15 tables.

Introduction
Background and Related Work
Adversarial Attacks and Imperceptibility
State-of-the-art Adversarial Attacks on Tabular Data
Imperceptibility of Adversarial Attacks on Tabular data
Establishing Criteria for Imperceptibility
Minimisation of feature perturbation
Preservation of statistical data distribution
Narrow-guard feature perturbation
Preservation of feature semantics
Preservation of feature interdependencies
Properties of Imperceptibility
Proximity
Proximity Metrics
Sparsity
...and 39 more sections

Figures (8)

Figure 1: The perturbation on tabular data is more noticeable than images. An adversarial example for a sleeping koala shows how input perturbations caused by a typical adversarial attack, known as Fast Gradient Sign Method (FGSM) attack, can mislead an image recognition system while remaining indistinguishable to human eyes (Figure \ref{['fig:koala']}). In contrast, the distinction between two tabular records for classifying the presence of diabetes can easily be observed or detected by humans (Figure \ref{['fig:diabetes']}).
Figure 2: Three heatmaps visualising the total count of each individual feature being perturbed across different attack/model combinations over three mixed datasets, respectively. The X axis enumerates all the features---numerical features followed by categorical features in each dataset---while the Y axis enumerates all the attack/model combinations. The colour scale of each plot is determined by the number of generated adversarial examples in each individual dataset: 6,513 examples for the Adult dataset, 1,443 examples for the COMPAS dataset, and 200 examples for the German dataset. Overall, the four attacks (DeepFool, C&W, FGSM and PGD) primarily perturb numerical features with minimal or no changes to categorical features in the three mixed datasets. Due to insufficient effectiveness, the combination of the C&W attack and LinearSVC is not considered.
Figure 3: Heatmap showing the percentage of adversarial examples generated for each combination of model (LR, LinearSVC, MLP) and attack type (DeepFool, C&W, FGSM, PGD) that perturb immutable features: race, sex, and marital-status. The values are presented as a percentage of a total of 6513 adversarial examples.
Figure 4: Feature weights (coefficients) of LR for the COMPAS dataset. The features are ranked by feature weights. Features with a positive weight contribute to "Medium-Low" risk and features with a negative weight contribute to "High" risk. Features with a weight in the range of $[-0.15,0.15]$ have little impact on model prediction and therefore are omitted.
Figure 5: Feature weights (coefficients) of LR for the Diabetes dataset. The features are ranked by feature weights. Features with a positive weight contribute to "has" diabetes and features with a negative weight contribute to "not have" diabetes.
...and 3 more figures

Theorems & Definitions (1)

definition 1: Adversarial Attack

Investigating Imperceptibility of Adversarial Attacks on Tabular Data: An Empirical Analysis

TL;DR

Abstract

Investigating Imperceptibility of Adversarial Attacks on Tabular Data: An Empirical Analysis

Authors

TL;DR

Abstract

Table of Contents

Figures (8)

Theorems & Definitions (1)