Mixture of Predefined Experts: Maximizing Data Usage on Vertical Federated Learning
Jon Irureta, Gorka Azkune, Jon Imaz, Aizea Lojo, Javier Fernandez-Marques
TL;DR
Real-world Vertical Federated Learning faces partial sample alignment across participants, which reduces data utilization and increases vulnerability to malicious inputs. The authors present Split-MoPE, a framework that fuses SplitNN with a Mixture of Predefined Experts (MoPE) where each expert targets a specific data alignment, enabling single-round training with pretrained encoders. Key contributions include alignment-robust performance without full overlap, inherent per-sample interpretability, and reduced communication overhead, demonstrated across CIFAR-10/100 and Breast Cancer Wisconsin. This approach advances practical, privacy-preserving collaborative learning by handling misalignment, ensuring robustness, and enabling accountable data valuation in cross-institution settings.
Abstract
Vertical Federated Learning (VFL) has emerged as a critical paradigm for collaborative model training in privacy-sensitive domains such as finance and healthcare. However, most existing VFL frameworks rely on the idealized assumption of full sample alignment across participants, a premise that rarely holds in real-world scenarios. To bridge this gap, this work introduces Split-MoPE, a novel framework that integrates Split Learning with a specialized Mixture of Predefined Experts (MoPE) architecture. Unlike standard Mixture of Experts (MoE), where routing is learned dynamically, MoPE uses predefined experts to process specific data alignments, effectively maximizing data usage during both training and inference without requiring full sample overlap. By leveraging pretrained encoders for target data domains, Split-MoPE achieves state-of-the-art performance in a single communication round, significantly reducing the communication footprint compared to multi-round end-to-end training. Furthermore, unlike existing proposals that address sample misalignment, this novel architecture provides inherent robustness against malicious or noisy participants and offers per-sample interpretability by quantifying each collaborator's contribution to each prediction. Extensive evaluations on vision (CIFAR-10/100) and tabular (Breast Cancer Wisconsin) datasets demonstrate that Split-MoPE consistently outperforms state-of-the-art systems such as LASER and Vertical SplitNN, particularly in challenging scenarios with high data missingness.
