Backdooring Outlier Detection Methods: A Novel Attack Approach
ZeinabSadat Taghavi, Hossein Mirzaei
TL;DR
The paper addresses the vulnerability of open-set outlier detection to backdoor attacks, a gap left by prior work focused on closed-set classification. It introduces BATOD, which targets the open-set boundary by constructing two trigger types—In-Triggers to misclassify outliers as inliers and Out-Triggers to bias inliers toward uncertain, outlier-like predictions—trained via a surrogate model and a discriminator that distinguishes inliers from outliers. Through synthetic outlier generation using negative transformations and adversarial trigger design, BATOD degrades open-set performance while maintaining closed-set accuracy, and it demonstrates robustness against several defenses across multiple OSR and OOD datasets. The results underscore the need for defense strategies that specifically address open-set backdoors, with implications for safety-critical deployments in autonomous driving, medical imaging, and biometric systems.
Abstract
There have been several efforts in backdoor attacks, but these have primarily focused on the closed-set performance of classifiers (i.e., classification). This has left a gap in addressing the threat to classifiers' open-set performance, referred to as outlier detection in the literature. Reliable outlier detection is crucial for deploying classifiers in critical real-world applications such as autonomous driving and medical image analysis. First, we show that existing backdoor attacks fall short in affecting the open-set performance of classifiers, as they have been specifically designed to confuse intra-closed-set decision boundaries. In contrast, an effective backdoor attack for outlier detection needs to confuse the decision boundary between the closed and open sets. Motivated by this, in this study, we propose BATOD, a novel Backdoor Attack targeting the Outlier Detection task. Specifically, we design two categories of triggers to shift inlier samples to outliers and vice versa. We evaluate BATOD using various real-world datasets and demonstrate its superior ability to degrade the open-set performance of classifiers compared to previous attacks, both before and after applying defenses.
