Table of Contents
Fetching ...

Optimal Zero-Shot Detector for Multi-Armed Attacks

Federica Granese, Marco Romanelli, Pablo Piantanida

TL;DR

This work addresses the challenge of defending a classifier against a malicious, multi-armed attacker when no training data is available for defense. It introduces an information-theoretic minimax framework that optimally aggregates off-the-shelf detectors into a single zero-shot detector, with weights determined by mutual-information optimization. A computable surrogate and a Blahut–Arimoto–style algorithm yield a practical mixture detector whose output is thresholded to decide adversarial inputs. Empirical evaluation on CIFAR-10 and SVHN with a pre-trained ResNet-18 shows substantial and robust improvements over state-of-the-art adversarial detectors in multi-armed attack scenarios, while maintaining modularity and adaptability for future detectors. The approach offers a training-free, scalable defense that can generalize to related security problems such as intrusion and anomaly detection.

Abstract

This paper explores a scenario in which a malicious actor employs a multi-armed attack strategy to manipulate data samples, offering them various avenues to introduce noise into the dataset. Our central objective is to protect the data by detecting any alterations to the input. We approach this defensive strategy with utmost caution, operating in an environment where the defender possesses significantly less information compared to the attacker. Specifically, the defender is unable to utilize any data samples for training a defense model or verifying the integrity of the channel. Instead, the defender relies exclusively on a set of pre-existing detectors readily available "off the shelf". To tackle this challenge, we derive an innovative information-theoretic defense approach that optimally aggregates the decisions made by these detectors, eliminating the need for any training data. We further explore a practical use-case scenario for empirical evaluation, where the attacker possesses a pre-trained classifier and launches well-known adversarial attacks against it. Our experiments highlight the effectiveness of our proposed solution, even in scenarios that deviate from the optimal setup.

Optimal Zero-Shot Detector for Multi-Armed Attacks

TL;DR

This work addresses the challenge of defending a classifier against a malicious, multi-armed attacker when no training data is available for defense. It introduces an information-theoretic minimax framework that optimally aggregates off-the-shelf detectors into a single zero-shot detector, with weights determined by mutual-information optimization. A computable surrogate and a Blahut–Arimoto–style algorithm yield a practical mixture detector whose output is thresholded to decide adversarial inputs. Empirical evaluation on CIFAR-10 and SVHN with a pre-trained ResNet-18 shows substantial and robust improvements over state-of-the-art adversarial detectors in multi-armed attack scenarios, while maintaining modularity and adaptability for future detectors. The approach offers a training-free, scalable defense that can generalize to related security problems such as intrusion and anomaly detection.

Abstract

This paper explores a scenario in which a malicious actor employs a multi-armed attack strategy to manipulate data samples, offering them various avenues to introduce noise into the dataset. Our central objective is to protect the data by detecting any alterations to the input. We approach this defensive strategy with utmost caution, operating in an environment where the defender possesses significantly less information compared to the attacker. Specifically, the defender is unable to utilize any data samples for training a defense model or verifying the integrity of the channel. Instead, the defender relies exclusively on a set of pre-existing detectors readily available "off the shelf". To tackle this challenge, we derive an innovative information-theoretic defense approach that optimally aggregates the decisions made by these detectors, eliminating the need for any training data. We further explore a practical use-case scenario for empirical evaluation, where the attacker possesses a pre-trained classifier and launches well-known adversarial attacks against it. Our experiments highlight the effectiveness of our proposed solution, even in scenarios that deviate from the optimal setup.
Paper Structure (35 sections, 21 equations, 8 figures, 23 tables)

This paper contains 35 sections, 21 equations, 8 figures, 23 tables.

Figures (8)

  • Figure 1: The shallow detectors are named after the loss function used to craft the attacks they are trained to detect. Overall, NSS clearly outperforms all the individual shallow detectors. The aggregation we propose allows us to use the shallow models to attain a detector whose performance is consistently comparable and in many cases better than NSS.
  • Figure 2: In \ref{['fig:acc_pgd1']} and \ref{['fig:acc_fgsm']}, the accuracies of the detectors on natural and adversarial examples; in \ref{['fig:hist_salad']} and \ref{['fig:hist_nss']} we show how the proposed method and NSS split the data samples. We report the results for the detection of adversarial examples in pink, and the results for the detection of the natural in blue.
  • Figure 3: Attacks created with L$_\infty$ norm, PGD and $\varepsilon=0.03125$. The attack losses are given in the captions. The shallow detectors are named after the loss function used to craft the attacks they are trained to detect.
  • Figure 4: The shallow detectors are named after the loss function used to craft the attacks they are trained to detect.
  • Figure 5: In pink the results for the adversarial examples and in blue the ones for the naturals. In this simulation, we consider a subset of the available detectors (ACE, KL, FR). Under each plot, we indicate the tested attack configuration parameters: algorithm-L$_p$-$\varepsilon$-loss.
  • ...and 3 more figures

Theorems & Definitions (6)

  • Definition 1
  • proof
  • proof
  • proof
  • proof
  • proof