Non-Robust Features are Not Always Useful in One-Class Classification
Matthew Lau, Haoran Wang, Alec Helbling, Matthew Hul, ShengYun Peng, Martin Andreoni, Willian T. Lunardi, Wenke Lee
TL;DR
This paper investigates the adversarial vulnerability of lightweight one-class classifiers used for anomaly detection, such as drone detection, under practical constraints. By adapting the NRF framework from adv_ex_features_not_bugs to one-class tasks, it evaluates performance, robustness, and NRF usefulness across multiple backbones and adversarial drift scenarios, including unseen anomalies. Key findings show that while models learn useful features, smaller architectures degrade under drift and require stronger attacks to break robustness; importantly, non-robust features are not consistently useful for the one-class task, and NRF usefulness is not predicted by model size or robustness. The work highlights the risk of learning NRFs in lightweight one-class detectors and motivates methods to prevent such features to improve deployment reliability in security-critical settings.
Abstract
The robustness of machine learning models has been questioned by the existence of adversarial examples. We examine the threat of adversarial examples in practical applications that require lightweight models for one-class classification. Building on Ilyas et al. (2019), we investigate the vulnerability of lightweight one-class classifiers to adversarial attacks and possible reasons for it. Our results show that lightweight one-class classifiers learn features that are not robust (e.g. texture) under stronger attacks. However, unlike in multi-class classification (Ilyas et al., 2019), these non-robust features are not always useful for the one-class task, suggesting that learning these unpredictive and non-robust features is an unwanted consequence of training.
