Table of Contents
Fetching ...

Privacy Backdoors: Enhancing Membership Inference through Poisoning Pre-trained Models

Yuxin Wen, Leo Marchyok, Sanghyun Hong, Jonas Geiping, Tom Goldstein, Nicholas Carlini

TL;DR

The paper reveals a privacy backdoor vulnerability that poisoning a pre-trained checkpoint can exploit to dramatically improve membership inference leakage after a victim fine-tunes the model. It formalizes a dual-objective poisoning mechanism that creates a differential loss between target and auxiliary data, with stealth constraints, and demonstrates this attack's effectiveness across CLIP and various large language models through comprehensive experiments and ablations. The findings show substantial gains in MIA success with minimal degradation of downstream performance, raise awareness about risks in open-source checkpoints, and advocate for stronger validation and integrity checks. Collectively, the work highlights a practical privacy risk in pre-trained foundations and motivates new defenses and verification practices for distributed model hubs.

Abstract

It is commonplace to produce application-specific models by fine-tuning large pre-trained models using a small bespoke dataset. The widespread availability of foundation model checkpoints on the web poses considerable risks, including the vulnerability to backdoor attacks. In this paper, we unveil a new vulnerability: the privacy backdoor attack. This black-box privacy attack aims to amplify the privacy leakage that arises when fine-tuning a model: when a victim fine-tunes a backdoored model, their training data will be leaked at a significantly higher rate than if they had fine-tuned a typical model. We conduct extensive experiments on various datasets and models, including both vision-language models (CLIP) and large language models, demonstrating the broad applicability and effectiveness of such an attack. Additionally, we carry out multiple ablation studies with different fine-tuning methods and inference strategies to thoroughly analyze this new threat. Our findings highlight a critical privacy concern within the machine learning community and call for a reevaluation of safety protocols in the use of open-source pre-trained models.

Privacy Backdoors: Enhancing Membership Inference through Poisoning Pre-trained Models

TL;DR

The paper reveals a privacy backdoor vulnerability that poisoning a pre-trained checkpoint can exploit to dramatically improve membership inference leakage after a victim fine-tunes the model. It formalizes a dual-objective poisoning mechanism that creates a differential loss between target and auxiliary data, with stealth constraints, and demonstrates this attack's effectiveness across CLIP and various large language models through comprehensive experiments and ablations. The findings show substantial gains in MIA success with minimal degradation of downstream performance, raise awareness about risks in open-source checkpoints, and advocate for stronger validation and integrity checks. Collectively, the work highlights a practical privacy risk in pre-trained foundations and motivates new defenses and verification practices for distributed model hubs.

Abstract

It is commonplace to produce application-specific models by fine-tuning large pre-trained models using a small bespoke dataset. The widespread availability of foundation model checkpoints on the web poses considerable risks, including the vulnerability to backdoor attacks. In this paper, we unveil a new vulnerability: the privacy backdoor attack. This black-box privacy attack aims to amplify the privacy leakage that arises when fine-tuning a model: when a victim fine-tunes a backdoored model, their training data will be leaked at a significantly higher rate than if they had fine-tuned a typical model. We conduct extensive experiments on various datasets and models, including both vision-language models (CLIP) and large language models, demonstrating the broad applicability and effectiveness of such an attack. Additionally, we carry out multiple ablation studies with different fine-tuning methods and inference strategies to thoroughly analyze this new threat. Our findings highlight a critical privacy concern within the machine learning community and call for a reevaluation of safety protocols in the use of open-source pre-trained models.
Paper Structure (16 sections, 2 equations, 2 figures, 4 tables)