Table of Contents
Fetching ...

Test-Time Backdoor Detection for Object Detection Models

Hangtao Zhang, Yichen Wang, Shihui Yan, Chenyu Zhu, Ziqi Zhou, Linshan Hou, Shengshan Hu, Minghui Li, Yanjun Zhang, Leo Yu Zhang

TL;DR

This work addresses the vulnerability of object detection models to backdoor attacks by introducing TRACE, a black-box, test-time backdoor detector for object detection. TRACE leverages two semantic-aware transformation strategies: contextual information transformation (varying backgrounds to measure detection consistency) and focal information transformation (inserting foreground patches to probe saliency), combined via a test-time evaluation that uses transformation variances to distinguish poisoned from clean inputs. The method demonstrates robust, attack-agnostic performance across three datasets and three detectors against seven backdoor attacks, achieving around 0.90 AUROC and 0.88 F1 on average, and shows resilience to adaptive threat models. While TRACE is practical and universal, it incurs time overhead and relies on public auxiliary data; future work aims to optimize efficiency and explore zero-shot detection scenarios. Overall, TRACE advances safe deployment of OD systems in MLaaS and other settings by providing a robust firewall against test-time backdoors without requiring training data or attack specifics.

Abstract

Object detection models are vulnerable to backdoor attacks, where attackers poison a small subset of training samples by embedding a predefined trigger to manipulate prediction. Detecting poisoned samples (i.e., those containing triggers) at test time can prevent backdoor activation. However, unlike image classification tasks, the unique characteristics of object detection -- particularly its output of numerous objects -- pose fresh challenges for backdoor detection. The complex attack effects (e.g., "ghost" object emergence or "vanishing" object) further render current defenses fundamentally inadequate. To this end, we design TRAnsformation Consistency Evaluation (TRACE), a brand-new method for detecting poisoned samples at test time in object detection. Our journey begins with two intriguing observations: (1) poisoned samples exhibit significantly more consistent detection results than clean ones across varied backgrounds. (2) clean samples show higher detection consistency when introduced to different focal information. Based on these phenomena, TRACE applies foreground and background transformations to each test sample, then assesses transformation consistency by calculating the variance in objects confidences. TRACE achieves black-box, universal backdoor detection, with extensive experiments showing a 30% improvement in AUROC over state-of-the-art defenses and resistance to adaptive attacks.

Test-Time Backdoor Detection for Object Detection Models

TL;DR

This work addresses the vulnerability of object detection models to backdoor attacks by introducing TRACE, a black-box, test-time backdoor detector for object detection. TRACE leverages two semantic-aware transformation strategies: contextual information transformation (varying backgrounds to measure detection consistency) and focal information transformation (inserting foreground patches to probe saliency), combined via a test-time evaluation that uses transformation variances to distinguish poisoned from clean inputs. The method demonstrates robust, attack-agnostic performance across three datasets and three detectors against seven backdoor attacks, achieving around 0.90 AUROC and 0.88 F1 on average, and shows resilience to adaptive threat models. While TRACE is practical and universal, it incurs time overhead and relies on public auxiliary data; future work aims to optimize efficiency and explore zero-shot detection scenarios. Overall, TRACE advances safe deployment of OD systems in MLaaS and other settings by providing a robust firewall against test-time backdoors without requiring training data or attack specifics.

Abstract

Object detection models are vulnerable to backdoor attacks, where attackers poison a small subset of training samples by embedding a predefined trigger to manipulate prediction. Detecting poisoned samples (i.e., those containing triggers) at test time can prevent backdoor activation. However, unlike image classification tasks, the unique characteristics of object detection -- particularly its output of numerous objects -- pose fresh challenges for backdoor detection. The complex attack effects (e.g., "ghost" object emergence or "vanishing" object) further render current defenses fundamentally inadequate. To this end, we design TRAnsformation Consistency Evaluation (TRACE), a brand-new method for detecting poisoned samples at test time in object detection. Our journey begins with two intriguing observations: (1) poisoned samples exhibit significantly more consistent detection results than clean ones across varied backgrounds. (2) clean samples show higher detection consistency when introduced to different focal information. Based on these phenomena, TRACE applies foreground and background transformations to each test sample, then assesses transformation consistency by calculating the variance in objects confidences. TRACE achieves black-box, universal backdoor detection, with extensive experiments showing a 30% improvement in AUROC over state-of-the-art defenses and resistance to adaptive attacks.

Paper Structure

This paper contains 15 sections, 8 equations, 10 figures, 3 tables.

Figures (10)

  • Figure 1: Divergent attack effects of OD backdoors. (a-c): backdoor triggered by a fixed pattern. (d): backdoor triggered by the co-occurrence of two natural objects (a person and an umbrella).
  • Figure 2: Effect of background shift on (a) ResNet-50 and (b) YOLO redmon2016you, Faster-RCNN ren2015faster, DETR carion2020end. (b): each background case shows the max, min, and average confidence (as line plots) across three detectors. Larger shaded areas indicate greater instability in object predictions.
  • Figure 3: YOLO activation map: clean objects vs. FN-inducing triggers.
  • Figure 4: Background blending and YOLO confidence distribution.
  • Figure 5: Effect of the foreground object's position on YOLO detections. (d) uses an FN-inducing trigger placed at the image center.
  • ...and 5 more figures

Theorems & Definitions (2)

  • Definition 1: Contextual Transformation Consistency (CTC)
  • Definition 2: Focal Transformation Consistency (FTC)