Table of Contents
Fetching ...

VMask: Tunable Label Privacy Protection for Vertical Federated Learning via Layer Masking

Juntao Tan, Lan Zhang, Zhonghao Hu, Kai Yang, Peng Ran, Bo Li

TL;DR

VMask addresses the vulnerability of vertical federated learning to model completion attacks by introducing layer masking with secret sharing. It identifies critical layers via accumulated gradient norms, uses a shadow-model with a tunable privacy budget to bound label leakage, and iteratively masks layers until leakage stays under the budget. The framework preserves near-full model utility while significantly reducing attack success, and it runs far faster than cryptographic defenses. Empirical results across five architectures and 13 datasets demonstrate strong privacy-utility trade-offs and practical deployability for privacy-sensitive VFL deployments.

Abstract

Though vertical federated learning (VFL) is generally considered to be privacy-preserving, recent studies have shown that VFL system is vulnerable to label inference attacks originating from various attack surfaces. Among these attacks, the model completion (MC) attack is currently the most powerful one. Existing defense methods against it either sacrifice model accuracy or incur impractical computational overhead. In this paper, we propose VMask, a novel label privacy protection framework designed to defend against MC attack from the perspective of layer masking. Our key insight is to disrupt the strong correlation between input data and intermediate outputs by applying the secret sharing (SS) technique to mask layer parameters in the attacker's model. We devise a strategy for selecting critical layers to mask, reducing the overhead that would arise from naively applying SS to the entire model. Moreover, VMask is the first framework to offer a tunable privacy budget to defenders, allowing for flexible control over the levels of label privacy according to actual requirements. We built a VFL system, implemented VMask on it, and extensively evaluated it using five model architectures and 13 datasets with different modalities, comparing it to 12 other defense methods. The results demonstrate that VMask achieves the best privacy-utility trade-off, successfully thwarting the MC attack (reducing the label inference accuracy to a random guessing level) while preserving model performance (e.g., in Transformer-based model, the averaged drop of VFL model accuracy is only 0.09%). VMask's runtime is up to 60,846 times faster than cryptography-based methods, and it only marginally exceeds that of standard VFL by 1.8 times in a large Transformer-based model, which is generally acceptable.

VMask: Tunable Label Privacy Protection for Vertical Federated Learning via Layer Masking

TL;DR

VMask addresses the vulnerability of vertical federated learning to model completion attacks by introducing layer masking with secret sharing. It identifies critical layers via accumulated gradient norms, uses a shadow-model with a tunable privacy budget to bound label leakage, and iteratively masks layers until leakage stays under the budget. The framework preserves near-full model utility while significantly reducing attack success, and it runs far faster than cryptographic defenses. Empirical results across five architectures and 13 datasets demonstrate strong privacy-utility trade-offs and practical deployability for privacy-sensitive VFL deployments.

Abstract

Though vertical federated learning (VFL) is generally considered to be privacy-preserving, recent studies have shown that VFL system is vulnerable to label inference attacks originating from various attack surfaces. Among these attacks, the model completion (MC) attack is currently the most powerful one. Existing defense methods against it either sacrifice model accuracy or incur impractical computational overhead. In this paper, we propose VMask, a novel label privacy protection framework designed to defend against MC attack from the perspective of layer masking. Our key insight is to disrupt the strong correlation between input data and intermediate outputs by applying the secret sharing (SS) technique to mask layer parameters in the attacker's model. We devise a strategy for selecting critical layers to mask, reducing the overhead that would arise from naively applying SS to the entire model. Moreover, VMask is the first framework to offer a tunable privacy budget to defenders, allowing for flexible control over the levels of label privacy according to actual requirements. We built a VFL system, implemented VMask on it, and extensively evaluated it using five model architectures and 13 datasets with different modalities, comparing it to 12 other defense methods. The results demonstrate that VMask achieves the best privacy-utility trade-off, successfully thwarting the MC attack (reducing the label inference accuracy to a random guessing level) while preserving model performance (e.g., in Transformer-based model, the averaged drop of VFL model accuracy is only 0.09%). VMask's runtime is up to 60,846 times faster than cryptography-based methods, and it only marginally exceeds that of standard VFL by 1.8 times in a large Transformer-based model, which is generally acceptable.

Paper Structure

This paper contains 43 sections, 4 equations, 14 figures, 8 tables, 6 algorithms.

Figures (14)

  • Figure 1: A typical vertical federated learning system.
  • Figure 2: The impact of single layer masking and accumulative layer masking on MC attack accuracy. The "Baseline" denotes MC attack accuracy obtained from the intact bottom model (without masking any layers) after VFL training.
  • Figure 3: Our insight is to randomize certain layer parameters of the attacker's bottom model to disrupt the correlation between input data $X$ and feature embedding $Z$, thereby reducing the strong predictive capability of $Z$ for the label $Y$.
  • Figure 4: Illustration of VMask framework.
  • Figure 5: Comparison of defense methods' effectiveness against MC attack, evaluated using models trained at each epoch.
  • ...and 9 more figures