Robust Collaborative Inference with Vertically Split Data Over Dynamic Device Environments
Surojit Ganguli, Zeyu Zhou, Christopher G. Brinton, David I. Inouye
TL;DR
The paper tackles robust inference for vertically split data across dynamic edge networks, introducing Dynamic Network VFL (DN-VFL) and the MAGS framework. MAGS combines (i) fault simulation during training via dropout (including communication dropout), (ii) replication of aggregators through MACL (and the low-cost 4-MACL variant), and (iii) gossip-based ensembling to reduce prediction variance at test time. A key theoretical insight shows that the fault-tolerant risk under dynamic conditions is bounded by a term that scales with the number of aggregators, and that gossiping lowers ensemble risk through diversity, with variance decaying as a function of the gossip rounds and graph spectral radius. Empirically, MAGS delivers strong robustness across high fault rates (up to 50%) on several datasets, often surpassing baselines by more than 20 percentage points and highlighting the practical viability of decentralized, fault-tolerant vertically split learning for safety-critical edge environments. This work establishes DN-VFL as a foundation for robust, privacy-conscious collaboration in dynamic networks and points to future extensions in asynchronous communication and privacy-preserving variants.
Abstract
When each edge device of a network only perceives a local part of the environment, collaborative inference across multiple devices is often needed to predict global properties of the environment. In safety-critical applications, collaborative inference must be robust to significant network failures caused by environmental disruptions or extreme weather. Existing collaborative learning approaches, such as privacy-focused Vertical Federated Learning (VFL), typically assume a centralized setup or that one device never fails. However, these assumptions make prior approaches susceptible to significant network failures. To address this problem, we first formalize the problem of robust collaborative inference over a dynamic network of devices that could experience significant network faults. Then, we develop a minimalistic yet impactful method called Multiple Aggregation with Gossip Rounds and Simulated Faults (MAGS) that synthesizes simulated faults via dropout, replication, and gossiping to significantly improve robustness over baselines. We also theoretically analyze our proposed approach to explain why each component enhances robustness. Extensive empirical results validate that MAGS is robust across a range of fault rates-including extreme fault rates.
