Privacy Backdoors: Stealing Data with Corrupted Pretrained Models
Shanglun Feng, Florian Tramèr
TL;DR
The paper exposes a novel supply-chain risk where tampering with pretrained weights enables privacy backdoors that can extract finetuning data from downstream tasks, even under differential privacy. It introduces data-trap backdoors that latch after capturing a single training example and then extinguish, enabling high-probability reconstruction with minimal impact on model utility. Extending the idea to transformers (ViT and BERT), the work designs a modular backdoor architecture with input, backdoor, amplifier, erasure, propagation, and output modules, and it demonstrates attacks under white-box and black-box access, including perfect membership inference and black-box data reconstruction via model extraction. The results imply that DP-SGD privacy guarantees can be nearly tight for end-to-end attackers if the model is backdoored, challenging the practice of loose privacy budgets and highlighting the urgent need for stricter protections in the ML supply chain. Overall, the work broadens the threat model for ML privacy, showing that untrusted pretrained models can leak or reconstruct private finetuning data and compel reevaluation of privacy protections in practice.
Abstract
Practitioners commonly download pretrained machine learning models from open repositories and finetune them to fit specific applications. We show that this practice introduces a new risk of privacy backdoors. By tampering with a pretrained model's weights, an attacker can fully compromise the privacy of the finetuning data. We show how to build privacy backdoors for a variety of models, including transformers, which enable an attacker to reconstruct individual finetuning samples, with a guaranteed success! We further show that backdoored models allow for tight privacy attacks on models trained with differential privacy (DP). The common optimistic practice of training DP models with loose privacy guarantees is thus insecure if the model is not trusted. Overall, our work highlights a crucial and overlooked supply chain attack on machine learning privacy.
