Double-Dip: Thwarting Label-Only Membership Inference Attacks with Transfer Learning and Randomization

Arezoo Rajabi; Reeya Pimple; Aiswarya Janardhanan; Surudhi Asokraj; Bhaskar Ramasubramanian; Radha Poovendran

Double-Dip: Thwarting Label-Only Membership Inference Attacks with Transfer Learning and Randomization

Arezoo Rajabi, Reeya Pimple, Aiswarya Janardhanan, Surudhi Asokraj, Bhaskar Ramasubramanian, Radha Poovendran

TL;DR

This paper tackles privacy risks from label-only membership inference attacks on overfitted DNNs by introducing Double-Dip, a two-stage defense that combines transfer learning (Stage-1) with randomized input perturbation (Stage-2). Stage-1 uses publicly pretrained models and freezing strategies to embed a small, overfitted model into a high-dimensional target model, reducing ASR and boosting nonmember accuracy. Stage-2 applies a smoothed classifier via Gaussian perturbations to create a region of constant output around each input, further lowering ASR toward random chance without retraining. Across CIFAR-10, GTSRB, and CelebA with diverse pretrained backbones (VGG-19, ResNet-18, Swin-T, FaceNet), the approach consistently reduces label-only MIA success while maintaining or improving nonmember accuracy, outperforming regularization and differential privacy baselines. The results highlight the practicality of leveraging transfer learning and lightweight randomization to enhance privacy in limited-data regimes while preserving usability.

Abstract

Transfer learning (TL) has been demonstrated to improve DNN model performance when faced with a scarcity of training samples. However, the suitability of TL as a solution to reduce vulnerability of overfitted DNNs to privacy attacks is unexplored. A class of privacy attacks called membership inference attacks (MIAs) aim to determine whether a given sample belongs to the training dataset (member) or not (nonmember). We introduce Double-Dip, a systematic empirical study investigating the use of TL (Stage-1) combined with randomization (Stage-2) to thwart MIAs on overfitted DNNs without degrading classification accuracy. Our study examines the roles of shared feature space and parameter values between source and target models, number of frozen layers, and complexity of pretrained models. We evaluate Double-Dip on three (Target, Source) dataset paris: (i) (CIFAR-10, ImageNet), (ii) (GTSRB, ImageNet), (iii) (CelebA, VGGFace2). We consider four publicly available pretrained DNNs: (a) VGG-19, (b) ResNet-18, (c) Swin-T, and (d) FaceNet. Our experiments demonstrate that Stage-1 reduces adversary success while also significantly increasing classification accuracy of nonmembers against an adversary with either white-box or black-box DNN model access, attempting to carry out SOTA label-only MIAs. After Stage-2, success of an adversary carrying out a label-only MIA is further reduced to near 50%, bringing it closer to a random guess and showing the effectiveness of Double-Dip. Stage-2 of Double-Dip also achieves lower ASR and higher classification accuracy than regularization and differential privacy-based methods.

Double-Dip: Thwarting Label-Only Membership Inference Attacks with Transfer Learning and Randomization

TL;DR

Abstract

Paper Structure (17 sections, 5 figures, 9 tables, 1 algorithm)

This paper contains 17 sections, 5 figures, 9 tables, 1 algorithm.

Introduction
Preliminaries
Threat Model
Double-Dip: A Two-Stage Approach
Stage-1 of Double-Dip
Stage-2 of Double-Dip
Double-Dip Algorithm
Experiment Settings
Evaluation
Role of Correlated Features
Complexity of Pretrained Models
Stage-1 of Double-Dip vs. SOTA
Stage-2 of Double-Dip: Reducing ASR
Use of Shadow Models
Discussion
...and 2 more sections

Figures (5)

Figure 1: Double-Dip Mechanism. Stage-1 uses transfer learning to embed features of a lower dimensional overfitted DNN into a target model that overcomes overfitting. The target model is learned by 'freezing' weights in $M$ layers of a publicly available pretrained model, and using samples from the target dataset to learn weights of the remaining $K-M$ layers of the pretrained model. Overcoming overfitting will enable resilience to membership inference attacks (MIAs) by reducing success rate of an adversary even when the size of the training dataset for the target model is limited. Stage-2 employs randomization to generate multiple noisy variants of a given input sample $x$. Each noisy variant is provided to the trained target model from Stage-1 to obtain the possible output class labels as probabilities. An averaging mechanism is then used to 'smooth' these output class labels to obtain the final output class label. The key insight underpinning Stage-2 is that randomization will affect estimates of the distance of a data point to a decision boundary. As a result, the final output label $y$ will not reveal information about whether the input $x$ was used to train the target model (member) or not (nonmember).
Figure 2: Stages-1&2 of Double-Dip vs. SOTA: Adversary success rate (ASR, lower is better) and classification accuracy (ACC, higher is better) for 500 training samples from GTSRB with a pretrained VGG-19 model when using (i) no transfer learning (NTL), (ii) regularization (L1/ L2), (iii) Stage-1 of Double-Dip, (iv) Double-Dip Stage-1 + diff. privacy (Stage-1+DP), (v) Stage-1 of Double-Dip + regularization (Stage-1+L1/ L2), and (vi) Stages-1&2 of Double-Dip. Stages-1&2 of Double-Dip achieves low ASR values while simultaneously ensuring high ACC. While Stage-1+DP achieves lowest ASR, it comes at the cost of a significant reduction in accuracy.
Figure 3: Stages-1&2 of Double-Dip Reduces ASR: Comparison of ASR values when using Stage-1 of Double-Dip (blue bars) and Stages-1&2 of Double-Dip (green bars) for $500$ (top) and $1000$ (bottom) training samples from the GTSRB dataset when an adversary carrying out a MIA has white-box model access. Stages-1&2 of Double-Dip is effective in reducing the ASR value, relative to Stage-1 of Double-Dip when using three different types of pretrained models- VGG-19, ResNet-18, and Swin-T.
Figure 4: Stages-1&2 of Double-Dip, Black-box Access: Comparison of ASR values when using Stage-1 of Double-Dip (blue bars) and Stages-1&2 of Double-Dip(green bars) for training samples from the CelebA dataset when an adversary carrying out a MIA has white-box (BIM, left) or black-box (HSJ, right) model access. We observe that Stages-1&2 of Double-Dip is effective in reducing ASR in both cases. (Please zoom in for clarity)
Figure 5: Effect of Number of Frozen Layers: This figure compares ASR (left) and ACC (right) values for different numbers of frozen layers of a pretrained VGG-19 model for the GTSRB dataset with training sets of sizes $100$ and $500$. A smaller training set size is more sensitive to the choice of the number of frozen layers of the pretrained model in Stage-1 of Double-Dip (compare fluctuations in ASR in left figure). Fewer samples in the target dataset also enables achieving a lower ASR; however, this is accompanied by a reduction in classification accuracy. On the other hand, a larger-sized target dataset results in ASR and ACC values that are more or less independent of the choice of the number of frozen layers of the pretrained model. (See Sec. 7.)

Double-Dip: Thwarting Label-Only Membership Inference Attacks with Transfer Learning and Randomization

TL;DR

Abstract

Double-Dip: Thwarting Label-Only Membership Inference Attacks with Transfer Learning and Randomization

Authors

TL;DR

Abstract

Table of Contents

Figures (5)