Table of Contents
Fetching ...

ProP: Efficient Backdoor Detection via Propagation Perturbation for Overparametrized Models

Tao Ren, Qiongxiu Li

TL;DR

ProP (Propagation Perturbation), a novel and scalable backdoor detection method that leverages statistical output distributions to identify backdoored models and their target classes without relying on exhausive optimization strategies, is proposed.

Abstract

Backdoor attacks pose significant challenges to the security of machine learning models, particularly for overparameterized models like deep neural networks. In this paper, we propose ProP (Propagation Perturbation), a novel and scalable backdoor detection method that leverages statistical output distributions to identify backdoored models and their target classes without relying on exhausive optimization strategies. ProP introduces a new metric, the benign score, to quantify output distributions and effectively distinguish between benign and backdoored models. Unlike existing approaches, ProP operates with minimal assumptions, requiring no prior knowledge of triggers or malicious samples, making it highly applicable to real-world scenarios. Extensive experimental validation across multiple popular backdoor attacks demonstrates that ProP achieves high detection accuracy and computational efficiency, outperforming existing methods. These results highlight ProP's potential as a robust and practical solution for backdoor detection.

ProP: Efficient Backdoor Detection via Propagation Perturbation for Overparametrized Models

TL;DR

ProP (Propagation Perturbation), a novel and scalable backdoor detection method that leverages statistical output distributions to identify backdoored models and their target classes without relying on exhausive optimization strategies, is proposed.

Abstract

Backdoor attacks pose significant challenges to the security of machine learning models, particularly for overparameterized models like deep neural networks. In this paper, we propose ProP (Propagation Perturbation), a novel and scalable backdoor detection method that leverages statistical output distributions to identify backdoored models and their target classes without relying on exhausive optimization strategies. ProP introduces a new metric, the benign score, to quantify output distributions and effectively distinguish between benign and backdoored models. Unlike existing approaches, ProP operates with minimal assumptions, requiring no prior knowledge of triggers or malicious samples, making it highly applicable to real-world scenarios. Extensive experimental validation across multiple popular backdoor attacks demonstrates that ProP achieves high detection accuracy and computational efficiency, outperforming existing methods. These results highlight ProP's potential as a robust and practical solution for backdoor detection.

Paper Structure

This paper contains 14 sections, 10 equations, 1 figure, 3 tables.

Figures (1)

  • Figure 1: Output distribution of benign model and backdoored model after applying the proposed ProP using five different backdoor attacks.