Table of Contents
Fetching ...

VTarbel: Targeted Label Attack with Minimal Knowledge on Detector-enhanced Vertical Federated Learning

Juntao Tan, Anran Li, Quanchao Liu, Peng Ran, Lan Zhang

TL;DR

This paper addresses the security of detector-enhanced vertical federated learning by proposing VTarbel, a two-stage targeted label attack that operates under minimal attacker knowledge. The preparation stage builds surrogate and detector models from a small, expressive set of benign inferences, while the attack stage uses gradient-based perturbations guided by these models to induce targeted misclassifications while evading detection. Extensive experiments across four architectures, seven multimodal datasets, and two anomaly detectors show VTarbel consistently outperforms state-of-the-art attacks and remains effective against defenses, underscoring significant security blind spots in current VFL deployments. The work highlights the need for attack-aware defenses and offers a rigorous framework for evaluating detector-augmented VFL robustness.

Abstract

Vertical federated learning (VFL) enables multiple parties with disjoint features to collaboratively train models without sharing raw data. While privacy vulnerabilities of VFL are extensively-studied, its security threats-particularly targeted label attacks-remain underexplored. In such attacks, a passive party perturbs inputs at inference to force misclassification into adversary-chosen labels. Existing methods rely on unrealistic assumptions (e.g., accessing VFL-model's outputs) and ignore anomaly detectors deployed in real-world systems. To bridge this gap, we introduce VTarbel, a two-stage, minimal-knowledge attack framework explicitly designed to evade detector-enhanced VFL inference. During the preparation stage, the attacker selects a minimal set of high-expressiveness samples (via maximum mean discrepancy), submits them through VFL protocol to collect predicted labels, and uses these pseudo-labels to train estimated detector and surrogate model on local features. In attack stage, these models guide gradient-based perturbations of remaining samples, crafting adversarial instances that induce targeted misclassifications and evade detection. We implement VTarbel and evaluate it against four model architectures, seven multimodal datasets, and two anomaly detectors. Across all settings, VTarbel outperforms four state-of-the-art baselines, evades detection, and retains effective against three representative privacy-preserving defenses. These results reveal critical security blind spots in current VFL deployments and underscore urgent need for robust, attack-aware defenses.

VTarbel: Targeted Label Attack with Minimal Knowledge on Detector-enhanced Vertical Federated Learning

TL;DR

This paper addresses the security of detector-enhanced vertical federated learning by proposing VTarbel, a two-stage targeted label attack that operates under minimal attacker knowledge. The preparation stage builds surrogate and detector models from a small, expressive set of benign inferences, while the attack stage uses gradient-based perturbations guided by these models to induce targeted misclassifications while evading detection. Extensive experiments across four architectures, seven multimodal datasets, and two anomaly detectors show VTarbel consistently outperforms state-of-the-art attacks and remains effective against defenses, underscoring significant security blind spots in current VFL deployments. The work highlights the need for attack-aware defenses and offers a rigorous framework for evaluating detector-augmented VFL robustness.

Abstract

Vertical federated learning (VFL) enables multiple parties with disjoint features to collaboratively train models without sharing raw data. While privacy vulnerabilities of VFL are extensively-studied, its security threats-particularly targeted label attacks-remain underexplored. In such attacks, a passive party perturbs inputs at inference to force misclassification into adversary-chosen labels. Existing methods rely on unrealistic assumptions (e.g., accessing VFL-model's outputs) and ignore anomaly detectors deployed in real-world systems. To bridge this gap, we introduce VTarbel, a two-stage, minimal-knowledge attack framework explicitly designed to evade detector-enhanced VFL inference. During the preparation stage, the attacker selects a minimal set of high-expressiveness samples (via maximum mean discrepancy), submits them through VFL protocol to collect predicted labels, and uses these pseudo-labels to train estimated detector and surrogate model on local features. In attack stage, these models guide gradient-based perturbations of remaining samples, crafting adversarial instances that induce targeted misclassifications and evade detection. We implement VTarbel and evaluate it against four model architectures, seven multimodal datasets, and two anomaly detectors. Across all settings, VTarbel outperforms four state-of-the-art baselines, evades detection, and retains effective against three representative privacy-preserving defenses. These results reveal critical security blind spots in current VFL deployments and underscore urgent need for robust, attack-aware defenses.

Paper Structure

This paper contains 33 sections, 13 equations, 12 figures, 5 tables, 2 algorithms.

Figures (12)

  • Figure 1: Illustration of a detector-enhanced VFL inference system. A malicious passive party ($P_1$) submits an adversarial feature embedding ($e_1$) to the active party $P_K$. The active party employs an anomaly detector $\phi$ to assess whether the aggregated embedding $E$ is anomalous. Based on the detector's output, the final prediction is either approved ($\hat{y}$) or rejected ($\texttt{REJ}$).
  • Figure 2: Impact of anomaly detector on ASR in VFL inference system. The "Ground-Truth" represents the proportion of the targeted label in the original test set, serving as a baseline for comparison.
  • Figure 3: Impact of the preparation stage ratio $\rho$ on the performance of attacker's local estimated detector and surrogate model.
  • Figure 4: Impact of the preparation stage ratio $\rho$ on ASR of each stage and the overall ASR.
  • Figure 5: Overview of the two-stage attack framework, VTarbel. The green samples represent test samples with high expressiveness, while the blue samples represent those with lower expressiveness. The red samples denote the maliciously generated samples fed into the VFL inference system. Different shapes (circle, triangle, and square) indicate test samples from different classes.
  • ...and 7 more figures

Theorems & Definitions (1)

  • Definition 1: Attack Success Rate