Table of Contents
Fetching ...

On the Adversarial Robustness of Learning-based Conformal Novelty Detection

Daofu Zhang, Mehrdad Pournaderi, Hanne M. Clifford, Yu Xiang, Pramod K. Varshney

TL;DR

This work examines the adversarial robustness of AdaDetect, a learning-based conformal novelty detector that provides finite-sample FDR control under exchangeability. It develops an oracle worst-case attack that yields the upper bound $ ext{FDR}^*_{ ext{attack}} \le \alpha + m_a \cdot \mathbb{E}ig[1/{\widetilde{R}} \lor 1\big]$, and a practical surrogate decision-based attack that operates with query access, supported by black-box attacks like HopSkipJumpAttack and Boundary Attack. Empirical results on synthetic and real-world datasets show that adversarial perturbations can substantially inflate FDR while often preserving or increasing power, underscoring fundamental vulnerabilities in current error-controlled novelty detection methods. The findings motivate defenses such as robust training and smoothing techniques to preserve FDR guarantees under attack and invite exploration of attacks targeting training or calibration data for enhanced realism and resilience.

Abstract

This paper studies the adversarial robustness of conformal novelty detection. In particular, we focus on AdaDetect, a powerful learning-based framework for novelty detection with finite-sample false discovery rate (FDR) control. While AdaDetect provides rigorous statistical guarantees under benign conditions, its behavior under adversarial perturbations remains unexplored. We first formulate an oracle attack setting that quantifies the worst-case degradation of FDR, deriving an upper bound that characterizes the statistical cost of attacks. This idealized formulation directly motivates a practical and effective attack scheme that only requires query access to AdaDetect's output labels. Coupling these formulations with two popular and complementary black-box adversarial algorithms, we systematically evaluate the vulnerability of AdaDetect on synthetic and real-world datasets. Our results show that adversarial perturbations can significantly increase the FDR while maintaining high detection power, exposing fundamental limitations of current error-controlled novelty detection methods and motivating the development of more robust alternatives.

On the Adversarial Robustness of Learning-based Conformal Novelty Detection

TL;DR

This work examines the adversarial robustness of AdaDetect, a learning-based conformal novelty detector that provides finite-sample FDR control under exchangeability. It develops an oracle worst-case attack that yields the upper bound , and a practical surrogate decision-based attack that operates with query access, supported by black-box attacks like HopSkipJumpAttack and Boundary Attack. Empirical results on synthetic and real-world datasets show that adversarial perturbations can substantially inflate FDR while often preserving or increasing power, underscoring fundamental vulnerabilities in current error-controlled novelty detection methods. The findings motivate defenses such as robust training and smoothing techniques to preserve FDR guarantees under attack and invite exploration of attacks targeting training or calibration data for enhanced realism and resilience.

Abstract

This paper studies the adversarial robustness of conformal novelty detection. In particular, we focus on AdaDetect, a powerful learning-based framework for novelty detection with finite-sample false discovery rate (FDR) control. While AdaDetect provides rigorous statistical guarantees under benign conditions, its behavior under adversarial perturbations remains unexplored. We first formulate an oracle attack setting that quantifies the worst-case degradation of FDR, deriving an upper bound that characterizes the statistical cost of attacks. This idealized formulation directly motivates a practical and effective attack scheme that only requires query access to AdaDetect's output labels. Coupling these formulations with two popular and complementary black-box adversarial algorithms, we systematically evaluate the vulnerability of AdaDetect on synthetic and real-world datasets. Our results show that adversarial perturbations can significantly increase the FDR while maintaining high detection power, exposing fundamental limitations of current error-controlled novelty detection methods and motivating the development of more robust alternatives.

Paper Structure

This paper contains 26 sections, 10 theorems, 42 equations, 8 tables.

Key Result

Proposition 1

$f_{\text{attack}}$ does not depend on the order of elements in $\{Z_{1},..., Z_n, Z_{n+j}: j\in \mathcal{H}_0\setminus \mathcal{A}\}$.

Theorems & Definitions (19)

  • Proposition 1
  • Remark 1
  • Theorem 1
  • Remark 2
  • Lemma 1
  • Remark 3
  • Lemma 2
  • Lemma 3
  • Remark 4
  • proof
  • ...and 9 more