A Linear Approach to Data Poisoning
Donald Flynn, Diego Granziol
TL;DR
The paper develops a random matrix theory framework to quantify the impact of targeted data poisoning on high-dimensional ridge least squares with an unpenalized intercept, under a proportional regime where $p,n\to\infty$ and $p/n\to c$. It derives closed-form, asymptotic laws for the poisoned score: a Gaussian limit $\hat{\boldsymbol{\beta}}^{\top}(\mathbf{x}_0+\mathbf{v}) \xrightarrow{d} \mathcal{N}(\mu,\sigma^2)$ with explicit mean $\mu(\theta,\|\mathbf{v}\|,\lambda,c)$ and variance $\sigma^2(\theta,\|\mathbf{v}\|,\lambda,c)$, where the mean captures alignment with the poisoning direction and the variance depends on the data geometry through the Marčenko–Pastur transforms $m(z)$ and $\tilde{m}(z)$. The results yield scaling laws showing that larger models (smaller $c$) improve robustness for fixed poisoning rate $\theta$, while regularization $\lambda$ partially counteracts imprinting. The framework also recovers the interpolation threshold as $\lambda\to0$ with $c<1$ and demonstrates linear poisoning effects for small $\theta$; synthetic experiments match the theory, and MNIST backdoor tests reveal qualitative agreement in the mean shift and trend in variance. Overall, the work provides a tractable, governance-ready quantitative lens on poisoning in linear models and a foundation for extending to more complex settings.
Abstract
Backdoor and data-poisoning attacks can flip predictions with tiny training corruptions, yet a sharp theory linking poisoning strength, overparameterization, and regularization is lacking. We analyze ridge least squares with an unpenalized intercept in the high-dimensional regime \(p,n\to\infty\), \(p/n\to c\). Targeted poisoning is modelled by shifting a \(θ\)-fraction of one class by a direction \(\mathbf{v}\) and relabelling. Using resolvent techniques and deterministic equivalents from random matrix theory, we derive closed-form limits for the poisoned score explicit in the model parameters. The formulas yield scaling laws, recover the interpolation threshold as \(c\to1\) in the ridgeless limit, and show that the weights align with the poisoning direction. Synthetic experiments match theory across sweeps of the parameters and MNIST backdoor tests show qualitatively consistent trends. The results provide a tractable framework for quantifying poisoning in linear models.
