Table of Contents
Fetching ...

Beyond Full Poisoning: Effective Availability Attacks with Partial Perturbation

Yu Zhe, Jun Sakuma

TL;DR

The paper tackles the problem of protecting publicly shared datasets by enabling availability attacks even when only a portion of the data is perturbed. It introduces Parameter Matching Attack (PMA), a bi-level optimization framework that designs a destination model with poor clean-data performance and perturbs a subset of data so that models trained on mixed data resemble this destination, thereby degrading test accuracy. PMA remains effective across four benchmarks (SVHN, CIFAR-10, CIFAR-100, ImageNet) and under various surrogate/target architectures and countermeasures, requiring a perturbation size of at least $25/255$ and a poison ratio of at least $40\%$ for robust impact. The approach is practical in real-world settings and demonstrates strong resilience to static and adaptive defenses, highlighting important considerations for data-sharing and privacy protection in machine learning workflows.

Abstract

The widespread use of publicly available datasets for training machine learning models raises significant concerns about data misuse. Availability attacks have emerged as a means for data owners to safeguard their data by designing imperceptible perturbations that degrade model performance when incorporated into training datasets. However, existing availability attacks are ineffective when only a portion of the data can be perturbed. To address this challenge, we propose a novel availability attack approach termed Parameter Matching Attack (PMA). PMA is the first availability attack capable of causing more than a 30\% performance drop when only a portion of data can be perturbed. PMA optimizes perturbations so that when the model is trained on a mixture of clean and perturbed data, the resulting model will approach a model designed to perform poorly. Experimental results across four datasets demonstrate that PMA outperforms existing methods, achieving significant model performance degradation when a part of the training data is perturbed. Our code is available in the supplementary materials.

Beyond Full Poisoning: Effective Availability Attacks with Partial Perturbation

TL;DR

The paper tackles the problem of protecting publicly shared datasets by enabling availability attacks even when only a portion of the data is perturbed. It introduces Parameter Matching Attack (PMA), a bi-level optimization framework that designs a destination model with poor clean-data performance and perturbs a subset of data so that models trained on mixed data resemble this destination, thereby degrading test accuracy. PMA remains effective across four benchmarks (SVHN, CIFAR-10, CIFAR-100, ImageNet) and under various surrogate/target architectures and countermeasures, requiring a perturbation size of at least and a poison ratio of at least for robust impact. The approach is practical in real-world settings and demonstrates strong resilience to static and adaptive defenses, highlighting important considerations for data-sharing and privacy protection in machine learning workflows.

Abstract

The widespread use of publicly available datasets for training machine learning models raises significant concerns about data misuse. Availability attacks have emerged as a means for data owners to safeguard their data by designing imperceptible perturbations that degrade model performance when incorporated into training datasets. However, existing availability attacks are ineffective when only a portion of the data can be perturbed. To address this challenge, we propose a novel availability attack approach termed Parameter Matching Attack (PMA). PMA is the first availability attack capable of causing more than a 30\% performance drop when only a portion of data can be perturbed. PMA optimizes perturbations so that when the model is trained on a mixture of clean and perturbed data, the resulting model will approach a model designed to perform poorly. Experimental results across four datasets demonstrate that PMA outperforms existing methods, achieving significant model performance degradation when a part of the training data is perturbed. Our code is available in the supplementary materials.
Paper Structure (17 sections, 7 equations, 2 figures, 5 tables, 1 algorithm)

This paper contains 17 sections, 7 equations, 2 figures, 5 tables, 1 algorithm.

Figures (2)

  • Figure 1: Training on clean data yields high accuracy. Previous availability attacks add perturbations to all samples, making the dataset unlearnable and resulting in a model with low accuracy. Our approach requires perturbations on only part of the data to achieve unlearnability.
  • Figure 2: The data owner assigns wrong labels to $D_\text{cl}$ resulting in $D_\text{dirty}$; adds perturbations to $D_{\textbf{cl}}$ resulting in $D_{\text{poi}}$. The perturbations are optimized so that the model trained on the mixture of $D_{\text{poi}}$ and $\tilde{\mathcal{D}}_{\text{extra}}$ gradually approaches the parameters of the model trained on $D_{\text{dirty}}$ and $\tilde{\mathcal{D}}_{\text{extra}}$. The data owner controls datasets marked in blue; The dataset marked in orange is unknown to the data owner.