Transferable Availability Poisoning Attacks
Yiyong Liu, Michael Backes, Xiao Zhang
TL;DR
This work addresses the realism gap in data poisoning by showing that availability attacks designed for a single learning method poorly transfer when victims can choose any algorithm, including contrastive and supervised learners. It introduces Transferable Poisoning (TP), which first exploits alignment and uniformity from contrastive learning to craft poisons with strong intra-paradigm transfer, and then iteratively combines gradient signals from supervised and contrastive objectives via a shared backbone to approximate worst-case unlearnability across paradigms. TP demonstrates superior cross-learner transferability on CIFAR-10/100, TinyImageNet, and MiniImageNet, with robust performance across architectures and under partial poisoning; it highlights that cross-paradigm poisoning can be substantially more threatening than previously thought. The results motivate developing defenses that account for cross-method transfer and cross-paradigm robustness in data poisoning scenarios.
Abstract
We consider availability data poisoning attacks, where an adversary aims to degrade the overall test accuracy of a machine learning model by crafting small perturbations to its training data. Existing poisoning strategies can achieve the attack goal but assume the victim to employ the same learning method as what the adversary uses to mount the attack. In this paper, we argue that this assumption is strong, since the victim may choose any learning algorithm to train the model as long as it can achieve some targeted performance on clean data. Empirically, we observe a large decrease in the effectiveness of prior poisoning attacks if the victim employs an alternative learning algorithm. To enhance the attack transferability, we propose Transferable Poisoning, which first leverages the intrinsic characteristics of alignment and uniformity to enable better unlearnability within contrastive learning, and then iteratively utilizes the gradient information from supervised and unsupervised contrastive learning paradigms to generate the poisoning perturbations. Through extensive experiments on image benchmarks, we show that our transferable poisoning attack can produce poisoned samples with significantly improved transferability, not only applicable to the two learners used to devise the attack but also to learning algorithms and even paradigms beyond.
