Table of Contents
Fetching ...

Towards Active Participant Centric Vertical Federated Learning: Some Representations May Be All You Need

Jon Irureta, Jon Imaz, Aizea Lojo, Javier Fernandez-Marques, Marco González, Iñigo Perona

TL;DR

The paper addresses vertical federated learning under partial data overlap and high communication costs by introducing Active Participant Centric VFL (APC-VFL). APC-VFL leverages local unsupervised representation learning with autoencoders at each participant and a knowledge-distillation step at the active party to produce enhanced representations, enabling the active party to train a classifier on augmented data with only a single communication exchange. It demonstrates up to $634\times$, $380\times$, and $1590\times$ reductions in communication rounds on MIMIC-III, Breast Cancer Wisconsin, and UCI Credit Card, respectively, while achieving gains in F1 and accuracy compared to VFedTrans and, in certain settings, matching SplitNN under full alignment. The approach supports inference without ongoing collaboration, preserves privacy by keeping local encoders private, and shows practical impact for real-world VFL deployments across tabular datasets, with future work exploring other representation techniques and multi-modal scenarios.

Abstract

Existing Vertical FL (VFL) methods often struggle with realistic and unaligned data partitions, and incur into high communication costs and significant operational complexity. This work introduces a novel approach to VFL, Active Participant Centric VFL (APC-VFL), that excels in scenarios when data samples among participants are partially aligned at training. Among its strengths, APC-VFL only requires a single communication step with the active participant. This is made possible through a local and unsupervised representation learning stage at each participant followed by a knowledge distillation step in the active participant. Compared to other VFL methods such as SplitNN or VFedTrans, APC-VFL consistently outperforms them across three popular VFL datasets in terms of F1, accuracy and communication costs as the ratio of aligned data is reduced.

Towards Active Participant Centric Vertical Federated Learning: Some Representations May Be All You Need

TL;DR

The paper addresses vertical federated learning under partial data overlap and high communication costs by introducing Active Participant Centric VFL (APC-VFL). APC-VFL leverages local unsupervised representation learning with autoencoders at each participant and a knowledge-distillation step at the active party to produce enhanced representations, enabling the active party to train a classifier on augmented data with only a single communication exchange. It demonstrates up to , , and reductions in communication rounds on MIMIC-III, Breast Cancer Wisconsin, and UCI Credit Card, respectively, while achieving gains in F1 and accuracy compared to VFedTrans and, in certain settings, matching SplitNN under full alignment. The approach supports inference without ongoing collaboration, preserves privacy by keeping local encoders private, and shows practical impact for real-world VFL deployments across tabular datasets, with future work exploring other representation techniques and multi-modal scenarios.

Abstract

Existing Vertical FL (VFL) methods often struggle with realistic and unaligned data partitions, and incur into high communication costs and significant operational complexity. This work introduces a novel approach to VFL, Active Participant Centric VFL (APC-VFL), that excels in scenarios when data samples among participants are partially aligned at training. Among its strengths, APC-VFL only requires a single communication step with the active participant. This is made possible through a local and unsupervised representation learning stage at each participant followed by a knowledge distillation step in the active participant. Compared to other VFL methods such as SplitNN or VFedTrans, APC-VFL consistently outperforms them across three popular VFL datasets in terms of F1, accuracy and communication costs as the ratio of aligned data is reduced.

Paper Structure

This paper contains 36 sections, 13 equations, 16 figures, 4 tables, 1 algorithm.

Figures (16)

  • Figure 1: Representation of forward pass (\ref{['subfig:forward']}) and backpropagation (\ref{['subfig:back']}) on a Vertical SplitNN scenario with three participants.
  • Figure 2: Data partition in VFL with two participants. $\mathcal{D}_A$, the set of aligned samples, is a small fraction of the whole dataset.
  • Figure 3: Proposed VFL process with APC-VFL, comprising four main steps. ①: local representation learning. ②: aligned representation learning. ③: knowledge distillation. ④: final classification model training.
  • Figure 4: Adjustment of the proposal for the comparison with vertical splitNN and assess the quality of the joint latent representations.
  • Figure 5: Mean results of the tested data partitions with different quantities of aligned samples. In APC-VFL, the loss used for knowledge distillation is shown between parenthesis. Macro and weighted averaged results for MIMIC-III can be found on Appendix \ref{['app: extra results']}.
  • ...and 11 more figures