Table of Contents
Fetching ...

SoK: The Last Line of Defense: On Backdoor Defense Evaluation

Gorka Abad, Marina Krček, Stefanos Koffas, Behrad Tajalli, Marco Arazzi, Roberto Riaño, Xiaoyun Xu, Zhuoran Liu, Antonino Nocera, Stjepan Picek

TL;DR

The paper tackles the evaluation crisis in backdoor defense research by conducting a large-scale meta-analysis of 183 papers and performing over 3,000 experiments across 3 datasets, 4 architectures, 5 attacks, and 16 defenses. It reveals pervasive gaps including narrow threat models, toy datasets, limited trigger diversity, and scarce runtime reporting, which inflate perceived defense effectiveness. The authors implement a diverse benchmark, introduce the Mean Absolute Difference Score to assess hyperparameter stability, and provide actionable guidelines for dataset diversity, adaptive attacker evaluation, and transparent reporting. They also publish a public dataset and advocate for community leaderboards to enable sustainable, reproducible, and fair comparisons that support real-world deployment of backdoor defenses.

Abstract

Backdoor attacks pose a significant threat to deep learning models by implanting hidden vulnerabilities that can be activated by malicious inputs. While numerous defenses have been proposed to mitigate these attacks, the heterogeneous landscape of evaluation methodologies hinders fair comparison between defenses. This work presents a systematic (meta-)analysis of backdoor defenses through a comprehensive literature review and empirical evaluation. We analyzed 183 backdoor defense papers published between 2018 and 2025 across major AI and security venues, examining the properties and evaluation methodologies of these defenses. Our analysis reveals significant inconsistencies in experimental setups, evaluation metrics, and threat model assumptions in the literature. Through extensive experiments involving three datasets (MNIST, CIFAR-100, ImageNet-1K), four model architectures (ResNet-18, VGG-19, ViT-B/16, DenseNet-121), 16 representative defenses, and five commonly used attacks, totaling over 3\,000 experiments, we demonstrate that defense effectiveness varies substantially across different evaluation setups. We identify critical gaps in current evaluation practices, including insufficient reporting of computational overhead and behavior under benign conditions, bias in hyperparameter selection, and incomplete experimentation. Based on our findings, we provide concrete challenges and well-motivated recommendations to standardize and improve future defense evaluations. Our work aims to equip researchers and industry practitioners with actionable insights for developing, assessing, and deploying defenses to different systems.

SoK: The Last Line of Defense: On Backdoor Defense Evaluation

TL;DR

The paper tackles the evaluation crisis in backdoor defense research by conducting a large-scale meta-analysis of 183 papers and performing over 3,000 experiments across 3 datasets, 4 architectures, 5 attacks, and 16 defenses. It reveals pervasive gaps including narrow threat models, toy datasets, limited trigger diversity, and scarce runtime reporting, which inflate perceived defense effectiveness. The authors implement a diverse benchmark, introduce the Mean Absolute Difference Score to assess hyperparameter stability, and provide actionable guidelines for dataset diversity, adaptive attacker evaluation, and transparent reporting. They also publish a public dataset and advocate for community leaderboards to enable sustainable, reproducible, and fair comparisons that support real-world deployment of backdoor defenses.

Abstract

Backdoor attacks pose a significant threat to deep learning models by implanting hidden vulnerabilities that can be activated by malicious inputs. While numerous defenses have been proposed to mitigate these attacks, the heterogeneous landscape of evaluation methodologies hinders fair comparison between defenses. This work presents a systematic (meta-)analysis of backdoor defenses through a comprehensive literature review and empirical evaluation. We analyzed 183 backdoor defense papers published between 2018 and 2025 across major AI and security venues, examining the properties and evaluation methodologies of these defenses. Our analysis reveals significant inconsistencies in experimental setups, evaluation metrics, and threat model assumptions in the literature. Through extensive experiments involving three datasets (MNIST, CIFAR-100, ImageNet-1K), four model architectures (ResNet-18, VGG-19, ViT-B/16, DenseNet-121), 16 representative defenses, and five commonly used attacks, totaling over 3\,000 experiments, we demonstrate that defense effectiveness varies substantially across different evaluation setups. We identify critical gaps in current evaluation practices, including insufficient reporting of computational overhead and behavior under benign conditions, bias in hyperparameter selection, and incomplete experimentation. Based on our findings, we provide concrete challenges and well-motivated recommendations to standardize and improve future defense evaluations. Our work aims to equip researchers and industry practitioners with actionable insights for developing, assessing, and deploying defenses to different systems.

Paper Structure

This paper contains 33 sections, 2 equations, 18 figures, 8 tables.

Figures (18)

  • Figure 1: Distribution of resources across different training phases, showing the relative occurrence of each resource type during pre-, in-, and post-training phases.
  • Figure 2: Distribution of the number of used datasets, attacks, defenses, and model types used in the analyzed defenses.
  • Figure 3: Overview of dataset, attack, defense, and model evaluation statistics.
  • Figure 4: Percentage per year of defenses that consider the effect of the defense when there is no attack.
  • Figure 5: Average CA and ASR for CIFAR-100 for different architectures.
  • ...and 13 more figures