Table of Contents
Fetching ...

Empirical Comparison of Membership Inference Attacks in Deep Transfer Learning

Yuxuan Bai, Gauri Pradhan, Marlon Tobaben, Antti Honkela

TL;DR

This work studies privacy leakage in deep transfer learning by empirically evaluating a broad suite of membership inference attacks (MIAs), spanning shadow-model-based, shadow-model-free, and white-box variants. It demonstrates a general power-law trend where MIA efficacy declines as the per-class dataset size increases for many score-based attacks, but notes exceptions such as the white-box IHA, which can excel under high data regimes and non-standard threat models. Across tuning paradigms, LiRA remains the most robust and informative attack in many scenarios, while RMIA offers resilience in certain settings; data augmentation during fine-tuning shows limited benefits for LiRA and RMIA in transfer learning. The study concludes that no single MIA suffices to quantify all privacy risks in deep transfer learning, advocating a multi-attack auditing approach for practical privacy risk assessment.

Abstract

With the emergence of powerful large-scale foundation models, the training paradigm is increasingly shifting from from-scratch training to transfer learning. This enables high utility training with small, domain-specific datasets typical in sensitive applications. Membership inference attacks (MIAs) provide an empirical estimate of the privacy leakage by machine learning models. Yet, prior assessments of MIAs against models fine-tuned with transfer learning rely on a small subset of possible attacks. We address this by comparing performance of diverse MIAs in transfer learning settings to help practitioners identify the most efficient attacks for privacy risk evaluation. We find that attack efficacy decreases with the increase in training data for score-based MIAs. We find that there is no one MIA which captures all privacy risks in models trained with transfer learning. While the Likelihood Ratio Attack (LiRA) demonstrates superior performance across most experimental scenarios, the Inverse Hessian Attack (IHA) proves to be more effective against models fine-tuned on PatchCamelyon dataset in high data regime.

Empirical Comparison of Membership Inference Attacks in Deep Transfer Learning

TL;DR

This work studies privacy leakage in deep transfer learning by empirically evaluating a broad suite of membership inference attacks (MIAs), spanning shadow-model-based, shadow-model-free, and white-box variants. It demonstrates a general power-law trend where MIA efficacy declines as the per-class dataset size increases for many score-based attacks, but notes exceptions such as the white-box IHA, which can excel under high data regimes and non-standard threat models. Across tuning paradigms, LiRA remains the most robust and informative attack in many scenarios, while RMIA offers resilience in certain settings; data augmentation during fine-tuning shows limited benefits for LiRA and RMIA in transfer learning. The study concludes that no single MIA suffices to quantify all privacy risks in deep transfer learning, advocating a multi-attack auditing approach for practical privacy risk assessment.

Abstract

With the emergence of powerful large-scale foundation models, the training paradigm is increasingly shifting from from-scratch training to transfer learning. This enables high utility training with small, domain-specific datasets typical in sensitive applications. Membership inference attacks (MIAs) provide an empirical estimate of the privacy leakage by machine learning models. Yet, prior assessments of MIAs against models fine-tuned with transfer learning rely on a small subset of possible attacks. We address this by comparing performance of diverse MIAs in transfer learning settings to help practitioners identify the most efficient attacks for privacy risk evaluation. We find that attack efficacy decreases with the increase in training data for score-based MIAs. We find that there is no one MIA which captures all privacy risks in models trained with transfer learning. While the Likelihood Ratio Attack (LiRA) demonstrates superior performance across most experimental scenarios, the Inverse Hessian Attack (IHA) proves to be more effective against models fine-tuned on PatchCamelyon dataset in high data regime.

Paper Structure

This paper contains 43 sections, 2 equations, 13 figures, 3 tables.

Figures (13)

  • Figure 1: MIA efficacy as measured using LiRA Carlini2022LiRA against WideResNet-50-2 ZagoruykoK2016WideResNet trained from scratch using CIFAR-10 versus the same model pre-trained on ImageNet-1k Deng2009ImageNet when only the last linear layer of the model is fine-tuned on CIFAR-10. The results are averaged over $3$ repeats, with each repeat using $M+1$ target models ($M=64$) that share the same optimized hyperparameters obtained through hyperparameter optimization (HPO). The errorbars represent the interquartile range (IQR) of corresponding $\textsc{tpr}$ at $\textsc{fpr}$. The plot demonstrates that the attack does not behave similarly across the $2$ training paradigms, highlighting the need to investigate the performance of different MIA approaches against foundation models fine-tuned using deep transfer learning to ensure that a strong attack is used to evaluate their privacy risks.
  • Figure 2: MIA efficacy against ViT-B/16 (Head-only) models as a function of $S$ (shots). Upper: Shadow-model-based attacks using $M$ shadow models. Lower: Shadow-model-free attacks. The errorbars represent the interquartile range (IQR) of the estimated $\textsc{tpr}$ at fixed $\textsc{fpr}$ and the dotted lines represent the maximum of the median MIA efficacy of shadow-model-based and black-box shadow-model-free attacks (IHA excluded). Shadow-model-based attacks generally demonstrate more stable and stronger MIA efficacy compared to shadow-model-free attacks. In the high-shot regime of PatchCamelyon, however, the white-box IHA has a considerable advantage over other MIAs in terms of MIA efficacy, leveraging on its access to all but the target record in the training dataset. Results are averaged over 10 repeats and we use 1 target model per repeat.
  • Figure 3: Comparison of MIA efficacy against R-50 fine-tuned on CIFAR-10 across 4 different $S$ (shots) with 3 different parameterization strategies: Head-only, FiLM, and ALL. The errorbars represent the interquartile range (IQR) of the estimated $\textsc{tpr}$ at fixed $\textsc{fpr}$. Results are averaged over 5 repeats with 1 target model in each repeat. For the strongest attacks, there is no considerable difference in MIA efficacy across the 3 parameterization schemes. Some points for Trajectory-MIA with ALL fine-tuning are not visible in the plots due to poor performance or OOM issues.
  • Figure 4: Relationship between MIA efficacy and the number of shadow models ($M$) for LiRA and RMIA against ViT-B/16 model with Head-only fine-tuned on (a) CIFAR-10 and (b) PatchCamelyon. Results demonstrates MIA efficacy in low data availability (shots $S=16$ for CIFAR-10 and $S=256$ for PatchCamelyon) and high data availability ($S=1024$ for both CIFAR-10 and PatchCamelyon) scenarios. For each configuration, we train $M+1$ models per repeat, using each model as the target while the remaining $M$ serve as shadow models. We compute the average MIA efficacy ($\textsc{tpr}$ at fixed $\textsc{fpr}$) across all $M+1$ target models per repeat, then construct boxplots using these average $\textsc{tpr}$ from $5$ independent repeats. LiRA dominates in terms of efficacy over RMIA despite the latter's performance being more robust to the variations in $M$.
  • Figure 5: Impact of data augmentation on MIA efficacy across $S$ (shots) for ViT-B/16 models Head-only fine-tuned on CIFAR-10 with data augmentations. We compare $2$ augmentation strategies: + Mirror (where original image plus a horizontally flipped copy of it are used to train the target model) and + Shift (where horizontally flipping and/or $\pm 1$-pixel shifts are applied to the original image), with No augmentation as the baseline. The errorbars represent the interquartile range (IQR). Left and middle panels show MIA efficacy for LiRA and RMIA, respectively, while the right panel shows test accuracy. Results are averaged over 5 repeats and we use $M+1$ target models ($M=64$) per repeat.
  • ...and 8 more figures