Table of Contents
Fetching ...

Enhancing Robustness of Machine Learning Systems via Data Transformations

Arjun Nitin Bhagoji, Daniel Cullina, Chawin Sitawarin, Prateek Mittal

TL;DR

The paper tackles the vulnerability of ML systems to evasion attacks by introducing data-transformations, notably PCA-based dimensionality reduction and anti-whitening, as a defense that is classifier- and domain-agnostic. By training and deploying models on transformed data, the approach increases the perturbation required for successful adversarial examples and reduces attack success across SVMs and neural networks on MNIST and HAR, with only modest declines in benign accuracy. The key contribution is a practical, scalable defense that complements existing strategies like adversarial training, offering tunable security-utility tradeoffs and general applicability across architectures and domains. The work demonstrates substantial robustness gains against multiple attack families—including white-box, classifier- and architecture-mismatch settings—while maintaining reasonable computational efficiency, and it lays groundwork for future enhancements via more sophisticated transformations and combinations with other defenses.

Abstract

We propose the use of data transformations as a defense against evasion attacks on ML classifiers. We present and investigate strategies for incorporating a variety of data transformations including dimensionality reduction via Principal Component Analysis and data `anti-whitening' to enhance the resilience of machine learning, targeting both the classification and the training phase. We empirically evaluate and demonstrate the feasibility of linear transformations of data as a defense mechanism against evasion attacks using multiple real-world datasets. Our key findings are that the defense is (i) effective against the best known evasion attacks from the literature, resulting in a two-fold increase in the resources required by a white-box adversary with knowledge of the defense for a successful attack, (ii) applicable across a range of ML classifiers, including Support Vector Machines and Deep Neural Networks, and (iii) generalizable to multiple application domains, including image classification and human activity classification.

Enhancing Robustness of Machine Learning Systems via Data Transformations

TL;DR

The paper tackles the vulnerability of ML systems to evasion attacks by introducing data-transformations, notably PCA-based dimensionality reduction and anti-whitening, as a defense that is classifier- and domain-agnostic. By training and deploying models on transformed data, the approach increases the perturbation required for successful adversarial examples and reduces attack success across SVMs and neural networks on MNIST and HAR, with only modest declines in benign accuracy. The key contribution is a practical, scalable defense that complements existing strategies like adversarial training, offering tunable security-utility tradeoffs and general applicability across architectures and domains. The work demonstrates substantial robustness gains against multiple attack families—including white-box, classifier- and architecture-mismatch settings—while maintaining reasonable computational efficiency, and it lays groundwork for future enhancements via more sophisticated transformations and combinations with other defenses.

Abstract

We propose the use of data transformations as a defense against evasion attacks on ML classifiers. We present and investigate strategies for incorporating a variety of data transformations including dimensionality reduction via Principal Component Analysis and data `anti-whitening' to enhance the resilience of machine learning, targeting both the classification and the training phase. We empirically evaluate and demonstrate the feasibility of linear transformations of data as a defense mechanism against evasion attacks using multiple real-world datasets. Our key findings are that the defense is (i) effective against the best known evasion attacks from the literature, resulting in a two-fold increase in the resources required by a white-box adversary with knowledge of the defense for a successful attack, (ii) applicable across a range of ML classifiers, including Support Vector Machines and Deep Neural Networks, and (iii) generalizable to multiple application domains, including image classification and human activity classification.

Paper Structure

This paper contains 62 sections, 16 equations, 14 figures, 3 tables, 1 algorithm.

Figures (14)

  • Figure 1: Comparison of benign and adversarial images taken from the MNIST dataset.
  • Figure 2: Magnitudes of the coefficients of the weight vector $\mathbf{w}$ of a linear SVM in the principal component basis. On the horizontal axis, we have $\sqrt{\lambda_i}$. On the vertical axis, $|(\mathbf{U}^{\mkern-1.5mu\mathsf{T}\mkern-1.5mu}\mathbf{w})_i|$. The classifier is trained on the original MNIST data.
  • Figure 3: The magnitudes of the coefficients of the weight vector $\mathbf{w}$ of a linear SVM in the principal component basis. On the horizontal axis, we have $\sqrt{\lambda_i}$. On the vertical axis, $|(\mathbf{U}^{\mkern-1.5mu\mathsf{T}\mkern-1.5mu}\mathbf{w})_i|$. The classifiers are trained on the MNIST data projected onto the top $k$ principal components.
  • Figure 4: Effectiveness of the defense in classifier mismatch setting for the MNIST dataset with Linear SVMs. The adversarial example success on the MNIST dataset is plotted versus the perturbation magnitude $\epsilon= \|\mathbf{x} - \tilde{\mathbf{x}} \|_2$. The attack is performed against the original classifier and the effect of the defense is plotted for each reduced dimension $k$.
  • Figure 5: Effectiveness of the defense for the MNIST dataset against optimal white-box attacks on Linear SVMs.
  • ...and 9 more figures