TMI! Finetuned Models Leak Private Information from their Pretraining Data

John Abascal; Stanley Wu; Alina Oprea; Jonathan Ullman

TMI! Finetuned Models Leak Private Information from their Pretraining Data

John Abascal, Stanley Wu, Alina Oprea, Jonathan Ullman

TL;DR

A novel metaclassifier-based attack that leverages the influence of memorized pretraining samples on predictions in the downstream task, TMI, is implemented and it is found that TMI can successfully infer membership of pretraining examples using query access to the finetuned model.

Abstract

Transfer learning has become an increasingly popular technique in machine learning as a way to leverage a pretrained model trained for one task to assist with building a finetuned model for a related task. This paradigm has been especially popular for $\textit{privacy}$ in machine learning, where the pretrained model is considered public, and only the data for finetuning is considered sensitive. However, there are reasons to believe that the data used for pretraining is still sensitive, making it essential to understand how much information the finetuned model leaks about the pretraining data. In this work we propose a new membership-inference threat model where the adversary only has access to the finetuned model and would like to infer the membership of the pretraining data. To realize this threat model, we implement a novel metaclassifier-based attack, $\textbf{TMI}$, that leverages the influence of memorized pretraining samples on predictions in the downstream task. We evaluate $\textbf{TMI}$ on both vision and natural language tasks across multiple transfer learning settings, including finetuning with differential privacy. Through our evaluation, we find that $\textbf{TMI}$ can successfully infer membership of pretraining examples using query access to the finetuned model. An open-source implementation of $\textbf{TMI}$ can be found on GitHub: https://github.com/johnmath/tmi-pets24.

TMI! Finetuned Models Leak Private Information from their Pretraining Data

TL;DR

Abstract

in machine learning, where the pretrained model is considered public, and only the data for finetuning is considered sensitive. However, there are reasons to believe that the data used for pretraining is still sensitive, making it essential to understand how much information the finetuned model leaks about the pretraining data. In this work we propose a new membership-inference threat model where the adversary only has access to the finetuned model and would like to infer the membership of the pretraining data. To realize this threat model, we implement a novel metaclassifier-based attack,

, that leverages the influence of memorized pretraining samples on predictions in the downstream task. We evaluate

on both vision and natural language tasks across multiple transfer learning settings, including finetuning with differential privacy. Through our evaluation, we find that

can successfully infer membership of pretraining examples using query access to the finetuned model. An open-source implementation of

can be found on GitHub: https://github.com/johnmath/tmi-pets24.

Paper Structure (43 sections, 5 theorems, 36 equations, 17 figures, 7 tables, 3 algorithms)

This paper contains 43 sections, 5 theorems, 36 equations, 17 figures, 7 tables, 3 algorithms.

Introduction
Background and Related Work
Machine Learning Background and Notation
Scaling Model Confidences
Transfer Learning
Differential Privacy
Related Work
Privacy Attacks on Machine Learning Models
Membership-Inference Attacks
Threat Model
Methodology
Membership Inference Under Distribution Shift
Adapting an Existing Attack
Issues with Adapting LiRA
Our TMI Attack
...and 28 more sections

Key Result

Lemma 4.1

If $c$ is OUT, then and if $c$ is IN, then

Figures (17)

Figure 1: Our New Membership-Inference Threat Model.
Figure 2: Distribution of the Test Statistic, $z$, for Multiple Values of $\alpha$
Figure 3: AUC of our Membership-Inference Attack on Mean Estimation as a Function of $\alpha$.
Figure 4: Scaled Model Confidences of Shadow Models Finetuned on Caltech 101 at Multiple Labels when Queried on a Sample from the "Dugong" Class in Tiny Imagenet
Figure 5: TMI Attack Performance on Downstream Tasks When Preterained CIFAR-100 Target Models are Finetuned Using Feature Extraction
...and 12 more figures

Theorems & Definitions (11)

Example 1.1
Example 1.2
Definition 2.1
Lemma 4.1
Lemma 4.2
Lemma A.1
proof
Lemma A.2
proof
Lemma A.3
...and 1 more

TMI! Finetuned Models Leak Private Information from their Pretraining Data

TL;DR

Abstract

TMI! Finetuned Models Leak Private Information from their Pretraining Data

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (17)

Theorems & Definitions (11)