VTarbel: Targeted Label Attack with Minimal Knowledge on Detector-enhanced Vertical Federated Learning

Juntao Tan; Anran Li; Quanchao Liu; Peng Ran; Lan Zhang

VTarbel: Targeted Label Attack with Minimal Knowledge on Detector-enhanced Vertical Federated Learning

Juntao Tan, Anran Li, Quanchao Liu, Peng Ran, Lan Zhang

TL;DR

This paper addresses the security of detector-enhanced vertical federated learning by proposing VTarbel, a two-stage targeted label attack that operates under minimal attacker knowledge. The preparation stage builds surrogate and detector models from a small, expressive set of benign inferences, while the attack stage uses gradient-based perturbations guided by these models to induce targeted misclassifications while evading detection. Extensive experiments across four architectures, seven multimodal datasets, and two anomaly detectors show VTarbel consistently outperforms state-of-the-art attacks and remains effective against defenses, underscoring significant security blind spots in current VFL deployments. The work highlights the need for attack-aware defenses and offers a rigorous framework for evaluating detector-augmented VFL robustness.

Abstract

Vertical federated learning (VFL) enables multiple parties with disjoint features to collaboratively train models without sharing raw data. While privacy vulnerabilities of VFL are extensively-studied, its security threats-particularly targeted label attacks-remain underexplored. In such attacks, a passive party perturbs inputs at inference to force misclassification into adversary-chosen labels. Existing methods rely on unrealistic assumptions (e.g., accessing VFL-model's outputs) and ignore anomaly detectors deployed in real-world systems. To bridge this gap, we introduce VTarbel, a two-stage, minimal-knowledge attack framework explicitly designed to evade detector-enhanced VFL inference. During the preparation stage, the attacker selects a minimal set of high-expressiveness samples (via maximum mean discrepancy), submits them through VFL protocol to collect predicted labels, and uses these pseudo-labels to train estimated detector and surrogate model on local features. In attack stage, these models guide gradient-based perturbations of remaining samples, crafting adversarial instances that induce targeted misclassifications and evade detection. We implement VTarbel and evaluate it against four model architectures, seven multimodal datasets, and two anomaly detectors. Across all settings, VTarbel outperforms four state-of-the-art baselines, evades detection, and retains effective against three representative privacy-preserving defenses. These results reveal critical security blind spots in current VFL deployments and underscore urgent need for robust, attack-aware defenses.

VTarbel: Targeted Label Attack with Minimal Knowledge on Detector-enhanced Vertical Federated Learning

TL;DR

Abstract

VTarbel: Targeted Label Attack with Minimal Knowledge on Detector-enhanced Vertical Federated Learning

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (12)

Theorems & Definitions (1)