Cooperative Decentralized Backdoor Attacks on Vertical Federated Learning
Seohyun Lee, Wenzhi Fang, Anindya Bijoy Das, Seyyedali Hosseinalipour, David J. Love, Christopher G. Brinton
TL;DR
The paper addresses backdoor attacks in vertical federated learning by proposing a server-free, cooperative attack where multiple adversaries collude over a graph to inflict a backdoor without relying on server gradients. It combines a variational autoencoder (VAE) with metric learning to locally infer target-label datapoints and uses a graph-based consensus to select datapoints for poisoning, followed by an intensity-based, split-trigger embedding across adversaries. The authors provide a convergence analysis showing a stationarity-gap bound that scales with gradient perturbation $\delta(\rho)$, which increases with graph connectivity $\rho$, and validate the approach with extensive experiments across five image datasets, where the proposed method achieves higher attack success rates (ASR) than baselines while maintaining clean-task accuracy (CDA) and showing robustness to gradient-noise defenses. The results demonstrate that higher adversary connectivity and trigger intensity enhance attack potency, highlighting security risks in cross-device VFL and underscoring the need for defenses focusing on decentralized label inference and collaboration patterns.
Abstract
Federated learning (FL) is vulnerable to backdoor attacks, where adversaries alter model behavior on target classification labels by embedding triggers into data samples. While these attacks have received considerable attention in horizontal FL, they are less understood for vertical FL (VFL), where devices hold different features of the samples, and only the server holds the labels. In this work, we propose a novel backdoor attack on VFL which (i) does not rely on gradient information from the server and (ii) considers potential collusion among multiple adversaries for sample selection and trigger embedding. Our label inference model augments variational autoencoders with metric learning, which adversaries can train locally. A consensus process over the adversary graph topology determines which datapoints to poison. We further propose methods for trigger splitting across the adversaries, with an intensity-based implantation scheme skewing the server towards the trigger. Our convergence analysis reveals the impact of backdoor perturbations on VFL indicated by a stationarity gap for the trained model, which we verify empirically as well. We conduct experiments comparing our attack with recent backdoor VFL approaches, finding that ours obtains significantly higher success rates for the same main task performance despite not using server information. Additionally, our results verify the impact of collusion on attack performance.
