Table of Contents
Fetching ...

Countering Backdoor Attacks in Image Recognition: A Survey and Evaluation of Mitigation Strategies

Kealan Dunnett, Reza Arablouei, Dimity Miller, Volkan Dedeoglu, Raja Jurdak

TL;DR

This survey addresses backdoor threats in image recognition by systematically evaluating sixteen mitigation strategies under eight backdoor attacks, across three datasets and four architectures, with 1%, 5%, and 10% poisoning ratios. It reveals that while many defenses provide some protection, their effectiveness is highly variable and most newer methods do not consistently outperform the classic baselines. The study emphasizes the challenges of data efficiency, hyperparameter sensitivity, and the difficulty of restoring backdoor classifications (RA/RDR) even when ASR is reduced. The findings point to a need for more robust, adaptable defenses that generalize across attack types, data settings, and model families, with future work focusing on improving recovery and reducing dependence on trigger assumptions.

Abstract

The widespread adoption of deep learning across various industries has introduced substantial challenges, particularly in terms of model explainability and security. The inherent complexity of deep learning models, while contributing to their effectiveness, also renders them susceptible to adversarial attacks. Among these, backdoor attacks are especially concerning, as they involve surreptitiously embedding specific triggers within training data, causing the model to exhibit aberrant behavior when presented with input containing the triggers. Such attacks often exploit vulnerabilities in outsourced processes, compromising model integrity without affecting performance on clean (trigger-free) input data. In this paper, we present a comprehensive review of existing mitigation strategies designed to counter backdoor attacks in image recognition. We provide an in-depth analysis of the theoretical foundations, practical efficacy, and limitations of these approaches. In addition, we conduct an extensive benchmarking of sixteen state-of-the-art approaches against eight distinct backdoor attacks, utilizing three datasets, four model architectures, and three poisoning ratios. Our results, derived from 122,236 individual experiments, indicate that while many approaches provide some level of protection, their performance can vary considerably. Furthermore, when compared to two seminal approaches, most newer approaches do not demonstrate substantial improvements in overall performance or consistency across diverse settings. Drawing from these findings, we propose potential directions for developing more effective and generalizable defensive mechanisms in the future.

Countering Backdoor Attacks in Image Recognition: A Survey and Evaluation of Mitigation Strategies

TL;DR

This survey addresses backdoor threats in image recognition by systematically evaluating sixteen mitigation strategies under eight backdoor attacks, across three datasets and four architectures, with 1%, 5%, and 10% poisoning ratios. It reveals that while many defenses provide some protection, their effectiveness is highly variable and most newer methods do not consistently outperform the classic baselines. The study emphasizes the challenges of data efficiency, hyperparameter sensitivity, and the difficulty of restoring backdoor classifications (RA/RDR) even when ASR is reduced. The findings point to a need for more robust, adaptable defenses that generalize across attack types, data settings, and model families, with future work focusing on improving recovery and reducing dependence on trigger assumptions.

Abstract

The widespread adoption of deep learning across various industries has introduced substantial challenges, particularly in terms of model explainability and security. The inherent complexity of deep learning models, while contributing to their effectiveness, also renders them susceptible to adversarial attacks. Among these, backdoor attacks are especially concerning, as they involve surreptitiously embedding specific triggers within training data, causing the model to exhibit aberrant behavior when presented with input containing the triggers. Such attacks often exploit vulnerabilities in outsourced processes, compromising model integrity without affecting performance on clean (trigger-free) input data. In this paper, we present a comprehensive review of existing mitigation strategies designed to counter backdoor attacks in image recognition. We provide an in-depth analysis of the theoretical foundations, practical efficacy, and limitations of these approaches. In addition, we conduct an extensive benchmarking of sixteen state-of-the-art approaches against eight distinct backdoor attacks, utilizing three datasets, four model architectures, and three poisoning ratios. Our results, derived from 122,236 individual experiments, indicate that while many approaches provide some level of protection, their performance can vary considerably. Furthermore, when compared to two seminal approaches, most newer approaches do not demonstrate substantial improvements in overall performance or consistency across diverse settings. Drawing from these findings, we propose potential directions for developing more effective and generalizable defensive mechanisms in the future.

Paper Structure

This paper contains 64 sections, 36 equations, 14 figures, 5 tables.

Figures (14)

  • Figure 1: Example of a backdoor image (right) and its corresponding clean image (left). A yellow square, serving as the trigger, has been added to the bottom left corner of the backdoor image.
  • Figure 2: Examples of different backdoor triggers used in the literature. Note that while IAB adds a local patch to each image, its position and scale can vary across images.
  • Figure 3: Threat models considered by existing backdoor attacks.
  • Figure 4: Examples of different IAB trigger patterns.
  • Figure 5: Visual representation of how a model can be segmented using the hypothesis introduced in gu2017badnets. Grey and white distinguish between the backdoor and clean components respectively.
  • ...and 9 more figures