Table of Contents
Fetching ...

A Linear Approach to Data Poisoning

Donald Flynn, Diego Granziol

TL;DR

The paper develops a random matrix theory framework to quantify the impact of targeted data poisoning on high-dimensional ridge least squares with an unpenalized intercept, under a proportional regime where $p,n\to\infty$ and $p/n\to c$. It derives closed-form, asymptotic laws for the poisoned score: a Gaussian limit $\hat{\boldsymbol{\beta}}^{\top}(\mathbf{x}_0+\mathbf{v}) \xrightarrow{d} \mathcal{N}(\mu,\sigma^2)$ with explicit mean $\mu(\theta,\|\mathbf{v}\|,\lambda,c)$ and variance $\sigma^2(\theta,\|\mathbf{v}\|,\lambda,c)$, where the mean captures alignment with the poisoning direction and the variance depends on the data geometry through the Marčenko–Pastur transforms $m(z)$ and $\tilde{m}(z)$. The results yield scaling laws showing that larger models (smaller $c$) improve robustness for fixed poisoning rate $\theta$, while regularization $\lambda$ partially counteracts imprinting. The framework also recovers the interpolation threshold as $\lambda\to0$ with $c<1$ and demonstrates linear poisoning effects for small $\theta$; synthetic experiments match the theory, and MNIST backdoor tests reveal qualitative agreement in the mean shift and trend in variance. Overall, the work provides a tractable, governance-ready quantitative lens on poisoning in linear models and a foundation for extending to more complex settings.

Abstract

Backdoor and data-poisoning attacks can flip predictions with tiny training corruptions, yet a sharp theory linking poisoning strength, overparameterization, and regularization is lacking. We analyze ridge least squares with an unpenalized intercept in the high-dimensional regime \(p,n\to\infty\), \(p/n\to c\). Targeted poisoning is modelled by shifting a \(θ\)-fraction of one class by a direction \(\mathbf{v}\) and relabelling. Using resolvent techniques and deterministic equivalents from random matrix theory, we derive closed-form limits for the poisoned score explicit in the model parameters. The formulas yield scaling laws, recover the interpolation threshold as \(c\to1\) in the ridgeless limit, and show that the weights align with the poisoning direction. Synthetic experiments match theory across sweeps of the parameters and MNIST backdoor tests show qualitatively consistent trends. The results provide a tractable framework for quantifying poisoning in linear models.

A Linear Approach to Data Poisoning

TL;DR

The paper develops a random matrix theory framework to quantify the impact of targeted data poisoning on high-dimensional ridge least squares with an unpenalized intercept, under a proportional regime where and . It derives closed-form, asymptotic laws for the poisoned score: a Gaussian limit with explicit mean and variance , where the mean captures alignment with the poisoning direction and the variance depends on the data geometry through the Marčenko–Pastur transforms and . The results yield scaling laws showing that larger models (smaller ) improve robustness for fixed poisoning rate , while regularization partially counteracts imprinting. The framework also recovers the interpolation threshold as with and demonstrates linear poisoning effects for small ; synthetic experiments match the theory, and MNIST backdoor tests reveal qualitative agreement in the mean shift and trend in variance. Overall, the work provides a tractable, governance-ready quantitative lens on poisoning in linear models and a foundation for extending to more complex settings.

Abstract

Backdoor and data-poisoning attacks can flip predictions with tiny training corruptions, yet a sharp theory linking poisoning strength, overparameterization, and regularization is lacking. We analyze ridge least squares with an unpenalized intercept in the high-dimensional regime , . Targeted poisoning is modelled by shifting a -fraction of one class by a direction and relabelling. Using resolvent techniques and deterministic equivalents from random matrix theory, we derive closed-form limits for the poisoned score explicit in the model parameters. The formulas yield scaling laws, recover the interpolation threshold as in the ridgeless limit, and show that the weights align with the poisoning direction. Synthetic experiments match theory across sweeps of the parameters and MNIST backdoor tests show qualitatively consistent trends. The results provide a tractable framework for quantifying poisoning in linear models.

Paper Structure

This paper contains 29 sections, 6 theorems, 103 equations, 5 figures, 1 table.

Key Result

Theorem 4.1

Under assumptions ass:clean-data-ass:high-dim, as $n, p \to \infty$ with $p/n \to c \in (0, \infty)$, the scalar output of the poisoned regressor satisfies where with $\tau^2=\|\mathbf{v}\|^2\,\frac{\theta}{2}\!\left(1-\frac{\theta}{2}\right)$. and $m(z)$ is the Stieltjes transform of the Marčenko–Pastur distribution with parameter $c$, given by and

Figures (5)

  • Figure 1: An example of the poisoning mechanism, a small perturbation is made in the top left of the image. The left image is correctly labelled as "$1$", and the right image has its label changed to "$0$" in the poisoned dataset.
  • Figure 2: We plot empirical vs theoretical values for the derived $\eta$, $\mu$ and $\sigma$ under the synthetic data model. We perform a parameter sweep across $c$, $\lambda$, $\| \mathbf{v} \|$ and $\theta$, plotting the theoretical parameter against the empirical for each case. The circular marks are the averages over 100 independent trials, and the shaded region is the interquartile range.
  • Figure 3: Plot of the poisoning efficacy $\eta$ for the binary classifier on synthetic data, as the parameters vary. We use the fixed values $c=0.1$, $\lambda=0.1$, $\| \mathbf{v} \| = 1$ and $\theta=0.1$, and then vary each parameter in turn. The circles are the theoretical results, and the triangles are the experimental results, with the interquartile range shaded.
  • Figure 4: Top: empirical and theoretical results for the $\mu$ in the synthetic data model. We fix the values $c=0.1$, $\|\mathbf{v}\| = 1$, $\theta = 0.1$, $\lambda = 0.1$, and then vary $c$, $\theta$ and $\lambda$. We fix $p=500$ and vary $n$ to get the required values of $c$. We plot the experimental mean and IQR across 100 independent samples. Bottom: We perform the poisoning and classification on MNIST. We use the same values of $c$, $\| \mathbf{v} \|$, $\theta$ and $\lambda$, and vary the number of data points included to control $c$. We do a binary classification on the digits "$0$" and "$1$".
  • Figure 5: Comparison for $\sigma$. The setup of the experiments ia the same as in Figure \ref{['fig:mu-comparison']}.

Theorems & Definitions (15)

  • Theorem 4.1: Asymptotic distribution under poisoning
  • Remark 4.2: Limit for vanishing regularization
  • Remark 4.3: Poisoning efficacy
  • Proposition 4.4: Alignment with poisoning vector
  • Lemma 5.1: Woodbury Formula (circa 1950)
  • Definition 1
  • Lemma A.1
  • Lemma A.2
  • proof
  • Lemma A.3: Burkholder inequality bai_spectral_2010, Lemma 2.13
  • ...and 5 more