Backdoor Learning Curves: Explaining Backdoor Poisoning Beyond Influence Functions
Antonio Emanuele Cinà, Kathrin Grosse, Sebastiano Vascon, Ambra Demontis, Battista Biggio, Fabio Roli, Marcello Pelillo
TL;DR
This paper introduces backdoor learning curves as a unifying framework to explain backdoor poisoning through incremental learning and influence functions. It identifies three key factors—model complexity controlled by hyperparameters, the fraction of poisoned samples, and the size/visibility of the backdoor trigger—that govern how quickly a model learns to associate a trigger with the attacker-chosen class. The authors define the backdoor learning slope, relate it to influence functions, and propose parameter-deviation metrics to study how learning backdoors alters model weights. A notable finding is the existence of a region in hyperparameter space where clean accuracy remains high while backdoor vulnerability is low, suggesting practical defenses via regularization and careful hyperparameter selection.
Abstract
Backdoor attacks inject poisoning samples during training, with the goal of forcing a machine learning model to output an attacker-chosen class when presented a specific trigger at test time. Although backdoor attacks have been demonstrated in a variety of settings and against different models, the factors affecting their effectiveness are still not well understood. In this work, we provide a unifying framework to study the process of backdoor learning under the lens of incremental learning and influence functions. We show that the effectiveness of backdoor attacks depends on: (i) the complexity of the learning algorithm, controlled by its hyperparameters; (ii) the fraction of backdoor samples injected into the training set; and (iii) the size and visibility of the backdoor trigger. These factors affect how fast a model learns to correlate the presence of the backdoor trigger with the target class. Our analysis unveils the intriguing existence of a region in the hyperparameter space in which the accuracy on clean test samples is still high while backdoor attacks are ineffective, thereby suggesting novel criteria to improve existing defenses.
