Table of Contents
Fetching ...

Fragile Giants: Understanding the Susceptibility of Models to Subpopulation Attacks

Isha Gupta, Hidde Lycklama, Emanuel Opel, Evan Rose, Anwar Hithnawi

TL;DR

This work introduces a theoretical framework that explains how overparameterized models, due to their large capacity, can inadvertently memorize and misclassify targeted subpopulations and shows a clear trend: models with more parameters are significantly more vulnerable to subpopulation poisoning.

Abstract

As machine learning models become increasingly complex, concerns about their robustness and trustworthiness have become more pressing. A critical vulnerability of these models is data poisoning attacks, where adversaries deliberately alter training data to degrade model performance. One particularly stealthy form of these attacks is subpopulation poisoning, which targets distinct subgroups within a dataset while leaving overall performance largely intact. The ability of these attacks to generalize within subpopulations poses a significant risk in real-world settings, as they can be exploited to harm marginalized or underrepresented groups within the dataset. In this work, we investigate how model complexity influences susceptibility to subpopulation poisoning attacks. We introduce a theoretical framework that explains how overparameterized models, due to their large capacity, can inadvertently memorize and misclassify targeted subpopulations. To validate our theory, we conduct extensive experiments on large-scale image and text datasets using popular model architectures. Our results show a clear trend: models with more parameters are significantly more vulnerable to subpopulation poisoning. Moreover, we find that attacks on smaller, human-interpretable subgroups often go undetected by these models. These results highlight the need to develop defenses that specifically address subpopulation vulnerabilities.

Fragile Giants: Understanding the Susceptibility of Models to Subpopulation Attacks

TL;DR

This work introduces a theoretical framework that explains how overparameterized models, due to their large capacity, can inadvertently memorize and misclassify targeted subpopulations and shows a clear trend: models with more parameters are significantly more vulnerable to subpopulation poisoning.

Abstract

As machine learning models become increasingly complex, concerns about their robustness and trustworthiness have become more pressing. A critical vulnerability of these models is data poisoning attacks, where adversaries deliberately alter training data to degrade model performance. One particularly stealthy form of these attacks is subpopulation poisoning, which targets distinct subgroups within a dataset while leaving overall performance largely intact. The ability of these attacks to generalize within subpopulations poses a significant risk in real-world settings, as they can be exploited to harm marginalized or underrepresented groups within the dataset. In this work, we investigate how model complexity influences susceptibility to subpopulation poisoning attacks. We introduce a theoretical framework that explains how overparameterized models, due to their large capacity, can inadvertently memorize and misclassify targeted subpopulations. To validate our theory, we conduct extensive experiments on large-scale image and text datasets using popular model architectures. Our results show a clear trend: models with more parameters are significantly more vulnerable to subpopulation poisoning. Moreover, we find that attacks on smaller, human-interpretable subgroups often go undetected by these models. These results highlight the need to develop defenses that specifically address subpopulation vulnerabilities.

Paper Structure

This paper contains 14 sections, 1 theorem, 6 equations, 6 figures, 2 tables.

Key Result

Theorem 1

Let $\mathcal{A}$ be a $\delta$-local subpopulation mixture learner for a noisy $k$-subpopulation mixture distribution $\mathcal{D}$ consisting of $k$ subpopulations $\mathcal{D}\xspace_1, \dots, \mathcal{D}\xspace_k$ with mixture coefficients $\gamma\xspace_1, \dots, \gamma\xspace_k$, that the mini

Figures (6)

  • Figure 1: Comparison of decision boundary shifts caused by a poisoning attack targeting a subgroup (red points) with $\alpha=2.0$. The background (green-pink) represents the clean model's decision regions, while the blue line shows the boundary after the poisoning attack. The decision boundary is approximated by classifying a mesh of points across the grid.
  • Figure 2: Average Target Damage across subgroups for increasing attack intensity.
  • Figure 3: Relationship between subgroup size and target damage at $\alpha = 2.0$.
  • Figure 4: CivilComments subgroup-level damage analysis.
  • Figure 5: CelebA subgroup-level damage analysis.
  • ...and 1 more figures

Theorems & Definitions (5)

  • Definition 1: Noisy $k$-Subpopulation Mixture Distribution (jagielski)
  • Definition 2: Label Flipping Subpopulation Poisoning Attack
  • Definition 3: $\delta$-local Subpopulation Mixture Learner
  • Theorem 1
  • proof