Table of Contents
Fetching ...

Vertical Federated Learning in Practice: The Good, the Bad, and the Ugly

Zhaomin Wu, Zhen Qin, Junyi Hou, Haodong Zhao, Qinbin Li, Bingsheng He, Lixin Fan

TL;DR

This survey investigates Vertical Federated Learning (VFL) deployment gaps by analyzing real-world data distributions with WikiDBs, revealing that most potential VFL scenarios are latent or fuzzy with highly imbalanced feature distributions. It introduces a data-oriented taxonomy of VFL algorithms across key alignment, feature balance, communication, and trustworthiness, highlighting that only a small fraction of existing work addresses precisely matched keys and balanced features. The study finds four practical challenges: widespread latent/fuzzy relationships, heterogeneity of data across parties, limited precise record alignment, and trust/privacy concerns, and it advocates targeted research directions to bridge theory and real-world VFL deployment. The findings underscore the need for robust, data-aware VFL methods that handle partial matching, imperfect alignment, and diverse data, while ensuring privacy, security, and fair contribution mechanisms in practice.

Abstract

Vertical Federated Learning (VFL) is a privacy-preserving collaborative learning paradigm that enables multiple parties with distinct feature sets to jointly train machine learning models without sharing their raw data. Despite its potential to facilitate cross-organizational collaborations, the deployment of VFL systems in real-world applications remains limited. To investigate the gap between existing VFL research and practical deployment, this survey analyzes the real-world data distributions in potential VFL applications and identifies four key findings that highlight this gap. We propose a novel data-oriented taxonomy of VFL algorithms based on real VFL data distributions. Our comprehensive review of existing VFL algorithms reveals that some common practical VFL scenarios have few or no viable solutions. Based on these observations, we outline key research directions aimed at bridging the gap between current VFL research and real-world applications.

Vertical Federated Learning in Practice: The Good, the Bad, and the Ugly

TL;DR

This survey investigates Vertical Federated Learning (VFL) deployment gaps by analyzing real-world data distributions with WikiDBs, revealing that most potential VFL scenarios are latent or fuzzy with highly imbalanced feature distributions. It introduces a data-oriented taxonomy of VFL algorithms across key alignment, feature balance, communication, and trustworthiness, highlighting that only a small fraction of existing work addresses precisely matched keys and balanced features. The study finds four practical challenges: widespread latent/fuzzy relationships, heterogeneity of data across parties, limited precise record alignment, and trust/privacy concerns, and it advocates targeted research directions to bridge theory and real-world VFL deployment. The findings underscore the need for robust, data-aware VFL methods that handle partial matching, imperfect alignment, and diverse data, while ensuring privacy, security, and fair contribution mechanisms in practice.

Abstract

Vertical Federated Learning (VFL) is a privacy-preserving collaborative learning paradigm that enables multiple parties with distinct feature sets to jointly train machine learning models without sharing their raw data. Despite its potential to facilitate cross-organizational collaborations, the deployment of VFL systems in real-world applications remains limited. To investigate the gap between existing VFL research and practical deployment, this survey analyzes the real-world data distributions in potential VFL applications and identifies four key findings that highlight this gap. We propose a novel data-oriented taxonomy of VFL algorithms based on real VFL data distributions. Our comprehensive review of existing VFL algorithms reveals that some common practical VFL scenarios have few or no viable solutions. Based on these observations, we outline key research directions aimed at bridging the gap between current VFL research and real-world applications.

Paper Structure

This paper contains 40 sections, 2 equations, 2 figures, 4 tables.

Figures (2)

  • Figure 1: Analysis and visualization of feature distributions and overlaps across real-world databases
  • Figure 2: Taxonomy of VFL algorithms based on feature balance, key alignment, communication, and trustworthy.