Table of Contents
Fetching ...

Neighborhood Blending: A Lightweight Inference-Time Defense Against Membership Inference Attacks

Osama Zafar, Shaojie Zhan, Tianxi Ji, Erman Ayday

TL;DR

Membership inference attacks threaten privacy by exploiting signals in model confidence. The paper introduces Neighborhood Blending, a lightweight, post-hoc defense that smooths confidence outputs by sampling a differentially private neighborhood of training samples sharing the same predicted label and averaging their predictions. It provides formal privacy and utility guarantees via a gumbel-top-k implementation that realizes the exponential mechanism, and demonstrates strong empirical protection against shadow-model and metric-based MIAs across diverse datasets while maintaining zero label loss. The approach offers a practical, model-agnostic wrapper that outperforms DP-SGD and MemGuard in utility preservation and privacy robustness, making it well-suited for MLaaS deployments with limited retraining or white-box access.

Abstract

In recent years, the widespread adoption of Machine Learning as a Service (MLaaS), particularly in sensitive environments, has raised considerable privacy concerns. Of particular importance are membership inference attacks (MIAs), which exploit behavioral discrepancies between training and non-training data to determine whether a specific record was included in the model's training set, thereby presenting significant privacy risks. Although existing defenses, such as adversarial regularization, DP-SGD, and MemGuard, assist in mitigating these threats, they often entail trade-offs such as compromising utility, increased computational requirements, or inconsistent protection against diverse attack vectors. In this paper, we introduce a novel inference-time defense mechanism called Neighborhood Blending, which mitigates MIAs without retraining the model or incurring significant computational overhead. Our approach operates post-training by smoothing the model's confidence outputs based on the neighborhood of a queried sample. By averaging predictions from similar training samples selected using differentially private sampling, our method establishes a consistent confidence pattern, rendering members and non-members indistinguishable to an adversary while maintaining high utility. Significantly, Neighborhood Blending maintains label integrity (zero label loss) and ensures high utility through an adaptive, "pay-as-you-go" distortion strategy. It is a model-agnostic approach that offers a practical, lightweight solution that enhances privacy without sacrificing model utility. Through extensive experiments across diverse datasets and models, we demonstrate that our defense significantly reduces MIA success rates while preserving model performance, outperforming existing post-hoc defenses like MemGuard and training-time techniques like DP-SGD in terms of utility retention.

Neighborhood Blending: A Lightweight Inference-Time Defense Against Membership Inference Attacks

TL;DR

Membership inference attacks threaten privacy by exploiting signals in model confidence. The paper introduces Neighborhood Blending, a lightweight, post-hoc defense that smooths confidence outputs by sampling a differentially private neighborhood of training samples sharing the same predicted label and averaging their predictions. It provides formal privacy and utility guarantees via a gumbel-top-k implementation that realizes the exponential mechanism, and demonstrates strong empirical protection against shadow-model and metric-based MIAs across diverse datasets while maintaining zero label loss. The approach offers a practical, model-agnostic wrapper that outperforms DP-SGD and MemGuard in utility preservation and privacy robustness, making it well-suited for MLaaS deployments with limited retraining or white-box access.

Abstract

In recent years, the widespread adoption of Machine Learning as a Service (MLaaS), particularly in sensitive environments, has raised considerable privacy concerns. Of particular importance are membership inference attacks (MIAs), which exploit behavioral discrepancies between training and non-training data to determine whether a specific record was included in the model's training set, thereby presenting significant privacy risks. Although existing defenses, such as adversarial regularization, DP-SGD, and MemGuard, assist in mitigating these threats, they often entail trade-offs such as compromising utility, increased computational requirements, or inconsistent protection against diverse attack vectors. In this paper, we introduce a novel inference-time defense mechanism called Neighborhood Blending, which mitigates MIAs without retraining the model or incurring significant computational overhead. Our approach operates post-training by smoothing the model's confidence outputs based on the neighborhood of a queried sample. By averaging predictions from similar training samples selected using differentially private sampling, our method establishes a consistent confidence pattern, rendering members and non-members indistinguishable to an adversary while maintaining high utility. Significantly, Neighborhood Blending maintains label integrity (zero label loss) and ensures high utility through an adaptive, "pay-as-you-go" distortion strategy. It is a model-agnostic approach that offers a practical, lightweight solution that enhances privacy without sacrificing model utility. Through extensive experiments across diverse datasets and models, we demonstrate that our defense significantly reduces MIA success rates while preserving model performance, outperforming existing post-hoc defenses like MemGuard and training-time techniques like DP-SGD in terms of utility retention.
Paper Structure (36 sections, 3 theorems, 8 equations, 2 figures, 8 tables, 1 algorithm)

This paper contains 36 sections, 3 theorems, 8 equations, 2 figures, 8 tables, 1 algorithm.

Key Result

Lemma 4.1

Let $u_i$ be the utility score for the $i$-th candidate and $G_i \sim \text{Gumbel}(0, 1)$ be i.i.d. noise. Selecting the top-$m$ indices based on perturbed scores $u_i + G_i$ is equivalent to sampling $m$ indices sequentially without replacement from a categorical distribution proportional to $\exp

Figures (2)

  • Figure 1: Correlation between attack accuracy with Neighbor defense and the proposed distortion metrics. The left plot shows the attack accuracy versus PCD, and the right plot shows the attack accuracy versus CVD across various ML models and datasets.
  • Figure 2: Comparative heatmap analysis demonstrating the correlation between defensive effectiveness (accuracy drop) and the resulting utility distortion (CVD) across all evaluated datasets and classical ML classifiers.

Theorems & Definitions (5)

  • Lemma 4.1: Exact Sampling Equivalence vieira2014gumbelkool2019stochastic
  • Theorem 4.2: Privacy Guarantee
  • proof
  • Theorem 4.3: Utility Guarantee
  • proof