Table of Contents
Fetching ...

Mixture of Predefined Experts: Maximizing Data Usage on Vertical Federated Learning

Jon Irureta, Gorka Azkune, Jon Imaz, Aizea Lojo, Javier Fernandez-Marques

TL;DR

Real-world Vertical Federated Learning faces partial sample alignment across participants, which reduces data utilization and increases vulnerability to malicious inputs. The authors present Split-MoPE, a framework that fuses SplitNN with a Mixture of Predefined Experts (MoPE) where each expert targets a specific data alignment, enabling single-round training with pretrained encoders. Key contributions include alignment-robust performance without full overlap, inherent per-sample interpretability, and reduced communication overhead, demonstrated across CIFAR-10/100 and Breast Cancer Wisconsin. This approach advances practical, privacy-preserving collaborative learning by handling misalignment, ensuring robustness, and enabling accountable data valuation in cross-institution settings.

Abstract

Vertical Federated Learning (VFL) has emerged as a critical paradigm for collaborative model training in privacy-sensitive domains such as finance and healthcare. However, most existing VFL frameworks rely on the idealized assumption of full sample alignment across participants, a premise that rarely holds in real-world scenarios. To bridge this gap, this work introduces Split-MoPE, a novel framework that integrates Split Learning with a specialized Mixture of Predefined Experts (MoPE) architecture. Unlike standard Mixture of Experts (MoE), where routing is learned dynamically, MoPE uses predefined experts to process specific data alignments, effectively maximizing data usage during both training and inference without requiring full sample overlap. By leveraging pretrained encoders for target data domains, Split-MoPE achieves state-of-the-art performance in a single communication round, significantly reducing the communication footprint compared to multi-round end-to-end training. Furthermore, unlike existing proposals that address sample misalignment, this novel architecture provides inherent robustness against malicious or noisy participants and offers per-sample interpretability by quantifying each collaborator's contribution to each prediction. Extensive evaluations on vision (CIFAR-10/100) and tabular (Breast Cancer Wisconsin) datasets demonstrate that Split-MoPE consistently outperforms state-of-the-art systems such as LASER and Vertical SplitNN, particularly in challenging scenarios with high data missingness.

Mixture of Predefined Experts: Maximizing Data Usage on Vertical Federated Learning

TL;DR

Real-world Vertical Federated Learning faces partial sample alignment across participants, which reduces data utilization and increases vulnerability to malicious inputs. The authors present Split-MoPE, a framework that fuses SplitNN with a Mixture of Predefined Experts (MoPE) where each expert targets a specific data alignment, enabling single-round training with pretrained encoders. Key contributions include alignment-robust performance without full overlap, inherent per-sample interpretability, and reduced communication overhead, demonstrated across CIFAR-10/100 and Breast Cancer Wisconsin. This approach advances practical, privacy-preserving collaborative learning by handling misalignment, ensuring robustness, and enabling accountable data valuation in cross-institution settings.

Abstract

Vertical Federated Learning (VFL) has emerged as a critical paradigm for collaborative model training in privacy-sensitive domains such as finance and healthcare. However, most existing VFL frameworks rely on the idealized assumption of full sample alignment across participants, a premise that rarely holds in real-world scenarios. To bridge this gap, this work introduces Split-MoPE, a novel framework that integrates Split Learning with a specialized Mixture of Predefined Experts (MoPE) architecture. Unlike standard Mixture of Experts (MoE), where routing is learned dynamically, MoPE uses predefined experts to process specific data alignments, effectively maximizing data usage during both training and inference without requiring full sample overlap. By leveraging pretrained encoders for target data domains, Split-MoPE achieves state-of-the-art performance in a single communication round, significantly reducing the communication footprint compared to multi-round end-to-end training. Furthermore, unlike existing proposals that address sample misalignment, this novel architecture provides inherent robustness against malicious or noisy participants and offers per-sample interpretability by quantifying each collaborator's contribution to each prediction. Extensive evaluations on vision (CIFAR-10/100) and tabular (Breast Cancer Wisconsin) datasets demonstrate that Split-MoPE consistently outperforms state-of-the-art systems such as LASER and Vertical SplitNN, particularly in challenging scenarios with high data missingness.
Paper Structure (20 sections, 5 equations, 8 figures, 9 tables)

This paper contains 20 sections, 5 equations, 8 figures, 9 tables.

Figures (8)

  • Figure 1: Mean performance of different VFL methods across CIFAR-10 and CIFAR-100. The left plot shows mean accuracies under varying data missingness ratios, where a higher $p_{miss}$ implies more missing data, while the right plot shows mean accuracies as the fraction of noisy or malicious participants in the federation increases.
  • Figure 2: (a) Classical VFL data partition, where the sample space is identical for every participant. (b) SplitNN's high-level architecture with the forward pass (solid line) and the backpropagation steps (dashed line).
  • Figure 3: Real data partition of VFL where samples among participants are not fully aligned.
  • Figure 4: Our proposed Split-MoPE, combining Split learning with a modified classification head, $h$, which consists of a Mixture of Predefined Experts (MoPE) layer.
  • Figure 5: Representation of the forward pass in MoPE. The embeddings corresponding to each participant's information are represented in different colors (active in green), and each Expert in the layer processes a combination of them.
  • ...and 3 more figures