Table of Contents
Fetching ...

Query Provenance Analysis: Efficient and Robust Defense against Query-based Black-box Attacks

Shaofei Li, Ziqi Zhang, Haomin Jia, Ding Li, Yao Guo, Xiangqun Chen

TL;DR

This paper proposes a novel approach, Query Provenance Analysis (QPA), for defending against query-based black-box attacks robustly (against both non-adaptive and adaptive attacks) and efficiently (in real-time).

Abstract

Query-based black-box attacks have emerged as a significant threat to machine learning systems, where adversaries can manipulate the input queries to generate adversarial examples that can cause misclassification of the model. To counter these attacks, researchers have proposed Stateful Defense Models (SDMs) for detecting adversarial query sequences and rejecting queries that are "similar" to the history queries. Existing state-of-the-art (SOTA) SDMs (e.g., BlackLight and PIHA) have shown great effectiveness in defending against these attacks. However, recent studies have shown that they are vulnerable to Oracle-guided Adaptive Rejection Sampling (OARS) attacks, which is a stronger adaptive attack strategy. It can be easily integrated with existing attack algorithms to evade the SDMs by generating queries with fine-tuned direction and step size of perturbations utilizing the leaked decision information from the SDMs. In this paper, we propose a novel approach, Query Provenance Analysis (QPA), for more robust and efficient SDMs. QPA encapsulates the historical relationships among queries as the sequence feature to capture the fundamental difference between benign and adversarial query sequences. To utilize the query provenance, we propose an efficient query provenance analysis algorithm with dynamic management. We evaluate QPA compared with two baselines, BlackLight and PIHA, on four widely used datasets with six query-based black-box attack algorithms. The results show that QPA outperforms the baselines in terms of defense effectiveness and efficiency on both non-adaptive and adaptive attacks. Specifically, QPA reduces the Attack Success Rate (ASR) of OARS to 4.08%, comparing to 77.63% and 87.72% for BlackLight and PIHA, respectively. Moreover, QPA also achieves 7.67x and 2.25x higher throughput than BlackLight and PIHA.

Query Provenance Analysis: Efficient and Robust Defense against Query-based Black-box Attacks

TL;DR

This paper proposes a novel approach, Query Provenance Analysis (QPA), for defending against query-based black-box attacks robustly (against both non-adaptive and adaptive attacks) and efficiently (in real-time).

Abstract

Query-based black-box attacks have emerged as a significant threat to machine learning systems, where adversaries can manipulate the input queries to generate adversarial examples that can cause misclassification of the model. To counter these attacks, researchers have proposed Stateful Defense Models (SDMs) for detecting adversarial query sequences and rejecting queries that are "similar" to the history queries. Existing state-of-the-art (SOTA) SDMs (e.g., BlackLight and PIHA) have shown great effectiveness in defending against these attacks. However, recent studies have shown that they are vulnerable to Oracle-guided Adaptive Rejection Sampling (OARS) attacks, which is a stronger adaptive attack strategy. It can be easily integrated with existing attack algorithms to evade the SDMs by generating queries with fine-tuned direction and step size of perturbations utilizing the leaked decision information from the SDMs. In this paper, we propose a novel approach, Query Provenance Analysis (QPA), for more robust and efficient SDMs. QPA encapsulates the historical relationships among queries as the sequence feature to capture the fundamental difference between benign and adversarial query sequences. To utilize the query provenance, we propose an efficient query provenance analysis algorithm with dynamic management. We evaluate QPA compared with two baselines, BlackLight and PIHA, on four widely used datasets with six query-based black-box attack algorithms. The results show that QPA outperforms the baselines in terms of defense effectiveness and efficiency on both non-adaptive and adaptive attacks. Specifically, QPA reduces the Attack Success Rate (ASR) of OARS to 4.08%, comparing to 77.63% and 87.72% for BlackLight and PIHA, respectively. Moreover, QPA also achieves 7.67x and 2.25x higher throughput than BlackLight and PIHA.
Paper Structure (27 sections, 4 equations, 8 figures, 6 tables)

This paper contains 27 sections, 4 equations, 8 figures, 6 tables.

Figures (8)

  • Figure 1: A motivating example of query provenance graph. $x_{t}$ indicates the start point, $x_{t} + \delta_{t}$ indicates the perturbed queries, and $x_{t}^{\prime}$ indicates the start point of the next iteration. The attack query sequence forms a highly organized graph structure while the benign query sequence exhibits random aggregation. Existing sdm employ a threshold-based decision boundary to detect the attack queries, which is vulnerable to oars that can generate queries outside the decision boundary while inside the perturbation budget.
  • Figure 2: Workflow of our system with qpa. The system receives the query stream as input and constructs the query provenance graph based on the similarity between queries. It analyzes the query provenance graph to update the anomaly set periodically. As a result, it rejects the malicious queries and returns model outputs only for benign ones.
  • Figure 3: Throughput of qpa compared with BlackLight, PIHA and qpa without dynamic management. The throughput is the number of queries processed per second.
  • Figure 4: Latency of qpa compared with BlackLight, PIHA and qpa without dynamic management. The latency is the average time taken to process a query. The x-axis represents the proportion of the total sequence and the y-axis represents the response latency. Each point in the graph represents the average latency of 500 queries.
  • Figure 5: Comparison of the individual feature in existing sdm with the sequence feature of our system in detecting NES-OARS. The x-axis represents the number of queries and the y-axis represents the maximum proportion of matched hash values for the individual feature distance and normalized anomaly score for sequence feature distance. (a) and (b) are the results of the two baselines, (c) is the individual features of our system and (d) is the sequence feature of our system. The blue dashed line represents the detection threshold.
  • ...and 3 more figures