Table of Contents
Fetching ...

Towards Robust Object Detection: Identifying and Removing Backdoors via Module Inconsistency Analysis

Xianda Zhang, Siyuan Liang

TL;DR

This work presents the first approach that addresses both the detection and removal of backdoors in two-stage object detection models, advancing the field of securing these complex systems against backdoor attacks.

Abstract

Object detection models, widely used in security-critical applications, are vulnerable to backdoor attacks that cause targeted misclassifications when triggered by specific patterns. Existing backdoor defense techniques, primarily designed for simpler models like image classifiers, often fail to effectively detect and remove backdoors in object detectors. We propose a backdoor defense framework tailored to object detection models, based on the observation that backdoor attacks cause significant inconsistencies between local modules' behaviors, such as the Region Proposal Network (RPN) and classification head. By quantifying and analyzing these inconsistencies, we develop an algorithm to detect backdoors. We find that the inconsistent module is usually the main source of backdoor behavior, leading to a removal method that localizes the affected module, resets its parameters, and fine-tunes the model on a small clean dataset. Extensive experiments with state-of-the-art two-stage object detectors show our method achieves a 90% improvement in backdoor removal rate over fine-tuning baselines, while limiting clean data accuracy loss to less than 4%. To the best of our knowledge, this work presents the first approach that addresses both the detection and removal of backdoors in two-stage object detection models, advancing the field of securing these complex systems against backdoor attacks.

Towards Robust Object Detection: Identifying and Removing Backdoors via Module Inconsistency Analysis

TL;DR

This work presents the first approach that addresses both the detection and removal of backdoors in two-stage object detection models, advancing the field of securing these complex systems against backdoor attacks.

Abstract

Object detection models, widely used in security-critical applications, are vulnerable to backdoor attacks that cause targeted misclassifications when triggered by specific patterns. Existing backdoor defense techniques, primarily designed for simpler models like image classifiers, often fail to effectively detect and remove backdoors in object detectors. We propose a backdoor defense framework tailored to object detection models, based on the observation that backdoor attacks cause significant inconsistencies between local modules' behaviors, such as the Region Proposal Network (RPN) and classification head. By quantifying and analyzing these inconsistencies, we develop an algorithm to detect backdoors. We find that the inconsistent module is usually the main source of backdoor behavior, leading to a removal method that localizes the affected module, resets its parameters, and fine-tunes the model on a small clean dataset. Extensive experiments with state-of-the-art two-stage object detectors show our method achieves a 90% improvement in backdoor removal rate over fine-tuning baselines, while limiting clean data accuracy loss to less than 4%. To the best of our knowledge, this work presents the first approach that addresses both the detection and removal of backdoors in two-stage object detection models, advancing the field of securing these complex systems against backdoor attacks.
Paper Structure (16 sections, 7 equations, 7 figures, 3 tables, 2 algorithms)

This paper contains 16 sections, 7 equations, 7 figures, 3 tables, 2 algorithms.

Figures (7)

  • Figure 1: The backdoor can be exposed by the inconsistency of different modules.
  • Figure 2: Our approach consists of two main stages: (1) Cross-Module Inconsistency Detection for identifying the presence of backdoors, and (2) Targeted Reset Finetuning for removing the detected backdoors while maintaining the model's performance on clean data.
  • Figure 3: The inconsistency scores around the trigger are significantly higher than those at the corresponding locations in clean samples. As reflected in the histograms, the mean inconsistency scores of the toxic samples are greater than those of the clean samples.
  • Figure 4: Performance Comparison of Backdoor Removal Methods and Naive Fine-Tuning on Clean and Poisoned Datasets
  • Figure : (a) Poisoned Model
  • ...and 2 more figures