Table of Contents
Fetching ...

BDefects4NN: A Backdoor Defect Database for Controlled Localization Studies in Neural Networks

Yisong Xiao, Aishan Liu, Xinwei Zhang, Tianyuan Zhang, Tianlin Li, Siyuan Liang, Xianglong Liu, Yang Liu, Dacheng Tao

TL;DR

This work tackles the security risk posed by backdoor defects in third-party pre-trained DNNs by introducing BDefects4NN, a neuron-level backdoor defect database that enables controlled defect localization and repair studies. Built from four backdoor attacks across four architectures and three datasets, the database contains 1,654 infected DNNs with ground-truth defect labels organized into 48 directories across four defect-quantity levels. The authors evaluate six fault-localization criteria (four backdoor-specific and two general) and two repair methods (neuron pruning and neuron fine-tuning), finding limited effectiveness of current localization approaches for backdoor defects, though weight-based criteria perform best and enable meaningful repair improvements. They further demonstrate practical threats with case studies in lane detection and large language models, underscoring the challenges of precise localization in safety-critical and real-world settings. Overall, BDefects4NN provides a rigorous benchmark and toolkit to advance neuron-level backdoor localization and repair, with significant implications for the safe deployment of DNNs in industry and AI systems.

Abstract

Pre-trained large deep learning models are now serving as the dominant component for downstream middleware users and have revolutionized the learning paradigm, replacing the traditional approach of training from scratch locally. To reduce development costs, developers often integrate third-party pre-trained deep neural networks (DNNs) into their intelligent software systems. However, utilizing untrusted DNNs presents significant security risks, as these models may contain intentional backdoor defects resulting from the black-box training process. These backdoor defects can be activated by hidden triggers, allowing attackers to maliciously control the model and compromise the overall reliability of the intelligent software. To ensure the safe adoption of DNNs in critical software systems, it is crucial to establish a backdoor defect database for localization studies. This paper addresses this research gap by introducing BDefects4NN, the first backdoor defect database, which provides labeled backdoor-defected DNNs at the neuron granularity and enables controlled localization studies of defect root causes. In BDefects4NN, we define three defect injection rules and employ four representative backdoor attacks across four popular network architectures and three widely adopted datasets, yielding a comprehensive database of 1,654 backdoor-defected DNNs with four defect quantities and varying infected neurons. Based on BDefects4NN, we conduct extensive experiments on evaluating six fault localization criteria and two defect repair techniques, which show limited effectiveness for backdoor defects. Additionally, we investigate backdoor-defected models in practical scenarios, specifically in lane detection for autonomous driving and large language models (LLMs), revealing potential threats and highlighting current limitations in precise defect localization.

BDefects4NN: A Backdoor Defect Database for Controlled Localization Studies in Neural Networks

TL;DR

This work tackles the security risk posed by backdoor defects in third-party pre-trained DNNs by introducing BDefects4NN, a neuron-level backdoor defect database that enables controlled defect localization and repair studies. Built from four backdoor attacks across four architectures and three datasets, the database contains 1,654 infected DNNs with ground-truth defect labels organized into 48 directories across four defect-quantity levels. The authors evaluate six fault-localization criteria (four backdoor-specific and two general) and two repair methods (neuron pruning and neuron fine-tuning), finding limited effectiveness of current localization approaches for backdoor defects, though weight-based criteria perform best and enable meaningful repair improvements. They further demonstrate practical threats with case studies in lane detection and large language models, underscoring the challenges of precise localization in safety-critical and real-world settings. Overall, BDefects4NN provides a rigorous benchmark and toolkit to advance neuron-level backdoor localization and repair, with significant implications for the safe deployment of DNNs in industry and AI systems.

Abstract

Pre-trained large deep learning models are now serving as the dominant component for downstream middleware users and have revolutionized the learning paradigm, replacing the traditional approach of training from scratch locally. To reduce development costs, developers often integrate third-party pre-trained deep neural networks (DNNs) into their intelligent software systems. However, utilizing untrusted DNNs presents significant security risks, as these models may contain intentional backdoor defects resulting from the black-box training process. These backdoor defects can be activated by hidden triggers, allowing attackers to maliciously control the model and compromise the overall reliability of the intelligent software. To ensure the safe adoption of DNNs in critical software systems, it is crucial to establish a backdoor defect database for localization studies. This paper addresses this research gap by introducing BDefects4NN, the first backdoor defect database, which provides labeled backdoor-defected DNNs at the neuron granularity and enables controlled localization studies of defect root causes. In BDefects4NN, we define three defect injection rules and employ four representative backdoor attacks across four popular network architectures and three widely adopted datasets, yielding a comprehensive database of 1,654 backdoor-defected DNNs with four defect quantities and varying infected neurons. Based on BDefects4NN, we conduct extensive experiments on evaluating six fault localization criteria and two defect repair techniques, which show limited effectiveness for backdoor defects. Additionally, we investigate backdoor-defected models in practical scenarios, specifically in lane detection for autonomous driving and large language models (LLMs), revealing potential threats and highlighting current limitations in precise defect localization.

Paper Structure

This paper contains 22 sections, 11 equations, 6 figures, 8 tables, 1 algorithm.

Figures (6)

  • Figure 1: Overview of BDefects4NN framework. Targeting image classification task, our BDefects4NN designs three rules to inject neuron-level backdoors into DNNs and builds 1,654 DNNs with backdoor defects, which can support the evaluation of fault localization methods and defect repair techniques.
  • Figure 2: Defects distribution of BDefects4NN database.
  • Figure 3: Performance of infected models across four quantity levels and four attacks on CIFAR-10.
  • Figure 4: Average correlation rate (%) of infected models on three datasets and four backdoor attacks. $Cor.I$ and $Cor.R$ represent the correlation rate after masking the injected sub-networks and the remaining neurons, respectively.
  • Figure 5: Effectiveness of six localization methods against specific attack on the CIFAR-10 dataset.
  • ...and 1 more figures