Enhancing Robustness of Machine Learning Systems via Data Transformations
Arjun Nitin Bhagoji, Daniel Cullina, Chawin Sitawarin, Prateek Mittal
TL;DR
The paper tackles the vulnerability of ML systems to evasion attacks by introducing data-transformations, notably PCA-based dimensionality reduction and anti-whitening, as a defense that is classifier- and domain-agnostic. By training and deploying models on transformed data, the approach increases the perturbation required for successful adversarial examples and reduces attack success across SVMs and neural networks on MNIST and HAR, with only modest declines in benign accuracy. The key contribution is a practical, scalable defense that complements existing strategies like adversarial training, offering tunable security-utility tradeoffs and general applicability across architectures and domains. The work demonstrates substantial robustness gains against multiple attack families—including white-box, classifier- and architecture-mismatch settings—while maintaining reasonable computational efficiency, and it lays groundwork for future enhancements via more sophisticated transformations and combinations with other defenses.
Abstract
We propose the use of data transformations as a defense against evasion attacks on ML classifiers. We present and investigate strategies for incorporating a variety of data transformations including dimensionality reduction via Principal Component Analysis and data `anti-whitening' to enhance the resilience of machine learning, targeting both the classification and the training phase. We empirically evaluate and demonstrate the feasibility of linear transformations of data as a defense mechanism against evasion attacks using multiple real-world datasets. Our key findings are that the defense is (i) effective against the best known evasion attacks from the literature, resulting in a two-fold increase in the resources required by a white-box adversary with knowledge of the defense for a successful attack, (ii) applicable across a range of ML classifiers, including Support Vector Machines and Deep Neural Networks, and (iii) generalizable to multiple application domains, including image classification and human activity classification.
