Neighborhood Blending: A Lightweight Inference-Time Defense Against Membership Inference Attacks
Osama Zafar, Shaojie Zhan, Tianxi Ji, Erman Ayday
TL;DR
Membership inference attacks threaten privacy by exploiting signals in model confidence. The paper introduces Neighborhood Blending, a lightweight, post-hoc defense that smooths confidence outputs by sampling a differentially private neighborhood of training samples sharing the same predicted label and averaging their predictions. It provides formal privacy and utility guarantees via a gumbel-top-k implementation that realizes the exponential mechanism, and demonstrates strong empirical protection against shadow-model and metric-based MIAs across diverse datasets while maintaining zero label loss. The approach offers a practical, model-agnostic wrapper that outperforms DP-SGD and MemGuard in utility preservation and privacy robustness, making it well-suited for MLaaS deployments with limited retraining or white-box access.
Abstract
In recent years, the widespread adoption of Machine Learning as a Service (MLaaS), particularly in sensitive environments, has raised considerable privacy concerns. Of particular importance are membership inference attacks (MIAs), which exploit behavioral discrepancies between training and non-training data to determine whether a specific record was included in the model's training set, thereby presenting significant privacy risks. Although existing defenses, such as adversarial regularization, DP-SGD, and MemGuard, assist in mitigating these threats, they often entail trade-offs such as compromising utility, increased computational requirements, or inconsistent protection against diverse attack vectors. In this paper, we introduce a novel inference-time defense mechanism called Neighborhood Blending, which mitigates MIAs without retraining the model or incurring significant computational overhead. Our approach operates post-training by smoothing the model's confidence outputs based on the neighborhood of a queried sample. By averaging predictions from similar training samples selected using differentially private sampling, our method establishes a consistent confidence pattern, rendering members and non-members indistinguishable to an adversary while maintaining high utility. Significantly, Neighborhood Blending maintains label integrity (zero label loss) and ensures high utility through an adaptive, "pay-as-you-go" distortion strategy. It is a model-agnostic approach that offers a practical, lightweight solution that enhances privacy without sacrificing model utility. Through extensive experiments across diverse datasets and models, we demonstrate that our defense significantly reduces MIA success rates while preserving model performance, outperforming existing post-hoc defenses like MemGuard and training-time techniques like DP-SGD in terms of utility retention.
