Cross-Silo Federated Learning for Multi-Tier Networks with Vertical and Horizontal Data Partitioning
Anirban Das, Timothy Castiglia, Shiqiang Wang, Stacy Patterson
TL;DR
This work addresses learning over data that are vertically partitioned across silos and horizontally partitioned within silos in a cross-silo federated setting. It introduces Tiered Decentralized Coordinate Descent (TDCD), which interleaves coordinated descent across silo hubs with local SGD inside silos, reducing communication by performing Q local updates between rounds and exchanging embeddings among silos. The authors provide a convergence analysis showing a bound with rate O($1/\sqrt{R}$) under standard assumptions, and discuss how the bound scales with the number of silos $N$ and the local-update parameter $Q$, with special cases recovering established results for local SGD and vertical FL. Empirical evaluations on CIFAR-10, MIMIC-III, and ModelNet40 demonstrate TDCD’s stability to increased partitioning and its communication-efficiency benefits, especially in latency-dominated networks. The results offer practical guidance for setting the local-update count $Q$ in different latency regimes and highlight TDCD’s potential for scalable, privacy-preserving learning in multi-tier organizations.
Abstract
We consider federated learning in tiered communication networks. Our network model consists of a set of silos, each holding a vertical partition of the data. Each silo contains a hub and a set of clients, with the silo's vertical data shard partitioned horizontally across its clients. We propose Tiered Decentralized Coordinate Descent (TDCD), a communication-efficient decentralized training algorithm for such two-tiered networks. The clients in each silo perform multiple local gradient steps before sharing updates with their hub to reduce communication overhead. Each hub adjusts its coordinates by averaging its workers' updates, and then hubs exchange intermediate updates with one another. We present a theoretical analysis of our algorithm and show the dependence of the convergence rate on the number of vertical partitions and the number of local updates. We further validate our approach empirically via simulation-based experiments using a variety of datasets and objectives.
