Empirical Comparison of Membership Inference Attacks in Deep Transfer Learning
Yuxuan Bai, Gauri Pradhan, Marlon Tobaben, Antti Honkela
TL;DR
This work studies privacy leakage in deep transfer learning by empirically evaluating a broad suite of membership inference attacks (MIAs), spanning shadow-model-based, shadow-model-free, and white-box variants. It demonstrates a general power-law trend where MIA efficacy declines as the per-class dataset size increases for many score-based attacks, but notes exceptions such as the white-box IHA, which can excel under high data regimes and non-standard threat models. Across tuning paradigms, LiRA remains the most robust and informative attack in many scenarios, while RMIA offers resilience in certain settings; data augmentation during fine-tuning shows limited benefits for LiRA and RMIA in transfer learning. The study concludes that no single MIA suffices to quantify all privacy risks in deep transfer learning, advocating a multi-attack auditing approach for practical privacy risk assessment.
Abstract
With the emergence of powerful large-scale foundation models, the training paradigm is increasingly shifting from from-scratch training to transfer learning. This enables high utility training with small, domain-specific datasets typical in sensitive applications. Membership inference attacks (MIAs) provide an empirical estimate of the privacy leakage by machine learning models. Yet, prior assessments of MIAs against models fine-tuned with transfer learning rely on a small subset of possible attacks. We address this by comparing performance of diverse MIAs in transfer learning settings to help practitioners identify the most efficient attacks for privacy risk evaluation. We find that attack efficacy decreases with the increase in training data for score-based MIAs. We find that there is no one MIA which captures all privacy risks in models trained with transfer learning. While the Likelihood Ratio Attack (LiRA) demonstrates superior performance across most experimental scenarios, the Inverse Hessian Attack (IHA) proves to be more effective against models fine-tuned on PatchCamelyon dataset in high data regime.
