Table of Contents
Fetching ...

Feature Inference Attack on Shapley Values

Xinjian Luo, Yangfan Jiang, Xiaokui Xiao

TL;DR

This work demonstrates that Shapley-value explanations, despite their interpretability benefits, can enable reconstruction of private inputs from explanations under pay-per-query MLaaS settings. It introduces two adversaries: one leveraging an auxiliary dataset to learn a mapping from explanations to inputs, and a second exploiting local linearity with data-independent interpolation, each validated across multiple platforms and datasets. The findings reveal meaningful leakage, particularly for important features, and show robustness to sampling noise, motivating concrete defenses such as quantization and selective disclosure of Shapley values. Overall, the paper substantially advances the understanding of privacy risks in local Shapley-based explanations and calls for privacy-preserving interpretability research with practical safeguards.

Abstract

As a solution concept in cooperative game theory, Shapley value is highly recognized in model interpretability studies and widely adopted by the leading Machine Learning as a Service (MLaaS) providers, such as Google, Microsoft, and IBM. However, as the Shapley value-based model interpretability methods have been thoroughly studied, few researchers consider the privacy risks incurred by Shapley values, despite that interpretability and privacy are two foundations of machine learning (ML) models. In this paper, we investigate the privacy risks of Shapley value-based model interpretability methods using feature inference attacks: reconstructing the private model inputs based on their Shapley value explanations. Specifically, we present two adversaries. The first adversary can reconstruct the private inputs by training an attack model based on an auxiliary dataset and black-box access to the model interpretability services. The second adversary, even without any background knowledge, can successfully reconstruct most of the private features by exploiting the local linear correlations between the model inputs and outputs. We perform the proposed attacks on the leading MLaaS platforms, i.e., Google Cloud, Microsoft Azure, and IBM aix360. The experimental results demonstrate the vulnerability of the state-of-the-art Shapley value-based model interpretability methods used in the leading MLaaS platforms and highlight the significance and necessity of designing privacy-preserving model interpretability methods in future studies. To our best knowledge, this is also the first work that investigates the privacy risks of Shapley values.

Feature Inference Attack on Shapley Values

TL;DR

This work demonstrates that Shapley-value explanations, despite their interpretability benefits, can enable reconstruction of private inputs from explanations under pay-per-query MLaaS settings. It introduces two adversaries: one leveraging an auxiliary dataset to learn a mapping from explanations to inputs, and a second exploiting local linearity with data-independent interpolation, each validated across multiple platforms and datasets. The findings reveal meaningful leakage, particularly for important features, and show robustness to sampling noise, motivating concrete defenses such as quantization and selective disclosure of Shapley values. Overall, the paper substantially advances the understanding of privacy risks in local Shapley-based explanations and calls for privacy-preserving interpretability research with practical safeguards.

Abstract

As a solution concept in cooperative game theory, Shapley value is highly recognized in model interpretability studies and widely adopted by the leading Machine Learning as a Service (MLaaS) providers, such as Google, Microsoft, and IBM. However, as the Shapley value-based model interpretability methods have been thoroughly studied, few researchers consider the privacy risks incurred by Shapley values, despite that interpretability and privacy are two foundations of machine learning (ML) models. In this paper, we investigate the privacy risks of Shapley value-based model interpretability methods using feature inference attacks: reconstructing the private model inputs based on their Shapley value explanations. Specifically, we present two adversaries. The first adversary can reconstruct the private inputs by training an attack model based on an auxiliary dataset and black-box access to the model interpretability services. The second adversary, even without any background knowledge, can successfully reconstruct most of the private features by exploiting the local linear correlations between the model inputs and outputs. We perform the proposed attacks on the leading MLaaS platforms, i.e., Google Cloud, Microsoft Azure, and IBM aix360. The experimental results demonstrate the vulnerability of the state-of-the-art Shapley value-based model interpretability methods used in the leading MLaaS platforms and highlight the significance and necessity of designing privacy-preserving model interpretability methods in future studies. To our best knowledge, this is also the first work that investigates the privacy risks of Shapley values.
Paper Structure (23 sections, 19 equations, 9 figures, 4 tables, 2 algorithms)

This paper contains 23 sections, 19 equations, 9 figures, 4 tables, 2 algorithms.

Figures (9)

  • Figure 1: Attack framework based on explanation reports: (1) the attacker sends fake queries to ML platforms and receives decisions with explanations; (2) based on the model inputs and explanations, the attacker designs a feature inference algorithm; (3) the explanation reports from the target customers may be obtained by the attacker; (4) from these explanations, the attacker can reconstruct the corresponding private features via the attack algorithm.
  • Figure 2: Overview of the attack with an auxiliary dataset. $\psi$ is the attack model.
  • Figure 3: The Pearson correlation coefficient $\rho$ between the first seven features of the Diabetes dataset diabetes and the model output $\hat{y}$ as well as the corresponding explanations $\boldsymbol{s}$. NN and Google Cloud Google are used as the testing model and platform.
  • Figure 4: Examples of correlations between private features and the corresponding Shapley values.
  • Figure 5: (a)-(d): the performance of attack 1 w.r.t. different sizes of auxiliary datasets; (e)-(h): the performance of attack 2 w.r.t. different sizes of random datasets.
  • ...and 4 more figures