Table of Contents
Fetching ...

Non-Robust Features are Not Always Useful in One-Class Classification

Matthew Lau, Haoran Wang, Alec Helbling, Matthew Hul, ShengYun Peng, Martin Andreoni, Willian T. Lunardi, Wenke Lee

TL;DR

This paper investigates the adversarial vulnerability of lightweight one-class classifiers used for anomaly detection, such as drone detection, under practical constraints. By adapting the NRF framework from adv_ex_features_not_bugs to one-class tasks, it evaluates performance, robustness, and NRF usefulness across multiple backbones and adversarial drift scenarios, including unseen anomalies. Key findings show that while models learn useful features, smaller architectures degrade under drift and require stronger attacks to break robustness; importantly, non-robust features are not consistently useful for the one-class task, and NRF usefulness is not predicted by model size or robustness. The work highlights the risk of learning NRFs in lightweight one-class detectors and motivates methods to prevent such features to improve deployment reliability in security-critical settings.

Abstract

The robustness of machine learning models has been questioned by the existence of adversarial examples. We examine the threat of adversarial examples in practical applications that require lightweight models for one-class classification. Building on Ilyas et al. (2019), we investigate the vulnerability of lightweight one-class classifiers to adversarial attacks and possible reasons for it. Our results show that lightweight one-class classifiers learn features that are not robust (e.g. texture) under stronger attacks. However, unlike in multi-class classification (Ilyas et al., 2019), these non-robust features are not always useful for the one-class task, suggesting that learning these unpredictive and non-robust features is an unwanted consequence of training.

Non-Robust Features are Not Always Useful in One-Class Classification

TL;DR

This paper investigates the adversarial vulnerability of lightweight one-class classifiers used for anomaly detection, such as drone detection, under practical constraints. By adapting the NRF framework from adv_ex_features_not_bugs to one-class tasks, it evaluates performance, robustness, and NRF usefulness across multiple backbones and adversarial drift scenarios, including unseen anomalies. Key findings show that while models learn useful features, smaller architectures degrade under drift and require stronger attacks to break robustness; importantly, non-robust features are not consistently useful for the one-class task, and NRF usefulness is not predicted by model size or robustness. The work highlights the risk of learning NRFs in lightweight one-class detectors and motivates methods to prevent such features to improve deployment reliability in security-critical settings.

Abstract

The robustness of machine learning models has been questioned by the existence of adversarial examples. We examine the threat of adversarial examples in practical applications that require lightweight models for one-class classification. Building on Ilyas et al. (2019), we investigate the vulnerability of lightweight one-class classifiers to adversarial attacks and possible reasons for it. Our results show that lightweight one-class classifiers learn features that are not robust (e.g. texture) under stronger attacks. However, unlike in multi-class classification (Ilyas et al., 2019), these non-robust features are not always useful for the one-class task, suggesting that learning these unpredictive and non-robust features is an unwanted consequence of training.
Paper Structure (18 sections, 1 equation, 4 figures, 4 tables)

This paper contains 18 sections, 1 equation, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Evaluation framework of the usefulness of non-robust features (e.g. texture) on one-class classification, adapted from adv_ex_features_not_bugs.
  • Figure 2: Standard and robust performance of different backbones across a subset of negative subclasses and the overall negative class. Each group of bars represents the average precision of the negative (sub)class against the positive drone class for different classifiers. The first green bar is the random baseline, followed by pairs of bars which are the standard and robust performance of MobileNetV3 (Mob.), EfficientNetB0 (Eff.) and ResNet18 (Res.) in order. Classes are grouped by roughly increasing adversarial drift from left to right, with semantic differences (diff.) and imagery differences (ImageNet-1K/CIFAR-10).
  • Figure 3: Overall performance of NRF models trained on the non-robust feature (NRF) dataset, tested on the NRF and original dataset respectively. NRF datasets generated with $\ell_\infty$ PGD with varying strengths $\epsilon=0.5,0.25, 4/255$.
  • Figure 4: Standard and robust performance of different backbones across a subset of negative subclasses and the overall negative class. Each group of bars represents the average precision of the negative (sub)class against the positive drone class for different classifiers. The first green bar is the random baseline (calculated in expectation), followed by pairs of bars which are the standard and robust performance of each model of the following order: MobileNetV3 small (Mob.), EfficientNetB0 (Eff.) and ResNet18 (Res.). Classes are grouped by approximately increasing adversarial drift from left to right, with semantic differences (diff.) and imagery differences (ImageNet-1K/CIFAR-10).