GLiRA: Black-Box Membership Inference Attack via Knowledge Distillation
Andrey V. Galichin, Mikhail Pautov, Alexey Zhavoronkin, Oleg Y. Rogov, Ivan Oseledets
TL;DR
This work tackles the privacy risks of training data in deep networks by developing GLiRA, a black-box membership inference attack guided by knowledge distillation. By training shadow networks with knowledge distillation from the target and applying a Likelihood Ratio Attack in an offline setting, GLiRA exploits logit-level information to distinguish training-membership more accurately than prior methods. The authors compare two distillation losses, KL and Mean Squared Error, and demonstrate that the KD-based MSE variant generally achieves higher accuracy at low false-positive rates across multiple datasets and architectures, while GLiRA-KL offers strong performance at higher FPR. The results suggest that distillation-based shadow modeling substantially enhances membership inference in black-box scenarios, underscoring privacy risks in API-accessible models and motivating defenses that address logit-level leakage and distribution alignment.
Abstract
While Deep Neural Networks (DNNs) have demonstrated remarkable performance in tasks related to perception and control, there are still several unresolved concerns regarding the privacy of their training data, particularly in the context of vulnerability to Membership Inference Attacks (MIAs). In this paper, we explore a connection between the susceptibility to membership inference attacks and the vulnerability to distillation-based functionality stealing attacks. In particular, we propose {GLiRA}, a distillation-guided approach to membership inference attack on the black-box neural network. We observe that the knowledge distillation significantly improves the efficiency of likelihood ratio of membership inference attack, especially in the black-box setting, i.e., when the architecture of the target model is unknown to the attacker. We evaluate the proposed method across multiple image classification datasets and models and demonstrate that likelihood ratio attacks when guided by the knowledge distillation, outperform the current state-of-the-art membership inference attacks in the black-box setting.
