Decoupled Vertical Federated Learning for Practical Training on Vertically Partitioned Data
Avi Amalanshu, Yash Sirvi, David I. Inouye
TL;DR
DVFL introduces a decoupled approach to Vertical Federated Learning that eliminates forward/backward locking by enabling asynchronous, locally supervised training at guests and hosts, plus a label-owner transfer learner. The three-tier DVFL hierarchy uses per-guest registers and input replay to support fault tolerance, redundancy, and learning from data beyond the intersection, while preserving gradient privacy. Empirical results show DVFL achieves graceful degradation under faults, with redundancy further boosting performance, and it can outperform traditional VFL baselines on several vertically partitioned datasets. The method offers engineering flexibility in communication and computation, enabling scalable, privacy-preserving learning for vertically partitioned data in realistic, imperfect networks.
Abstract
Vertical Federated Learning (VFL) is an emergent distributed machine learning paradigm for collaborative learning between clients who have disjoint features of common entities. However, standard VFL lacks fault tolerance, with each participant and connection being a single point of failure. Prior attempts to induce fault tolerance in VFL focus on the scenario of "straggling clients", usually entailing that all messages eventually arrive or that there is an upper bound on the number of late messages. To handle the more general problem of arbitrary crashes, we propose Decoupled VFL (DVFL). To handle training with faults, DVFL decouples training between communication rounds using local unsupervised objectives. By further decoupling label supervision from aggregation, DVFL also enables redundant aggregators. As secondary benefits, DVFL can enhance data efficiency and provides immunity against gradient-based attacks. In this work, we implement DVFL for split neural networks with a self-supervised autoencoder loss. When there are faults, DVFL outperforms the best VFL-based alternative (97.58% vs 96.95% on an MNIST task). Even under perfect conditions, performance is comparable.
