Table of Contents
Fetching ...

Vertical Federated Learning for Effectiveness, Security, Applicability: A Survey

Mang Ye, Wei Shen, Bo Du, Eduard Snezhko, Vassili Kovalev, Pong C. Yuen

TL;DR

This survey provides a systematic taxonomy of Vertical Federated Learning (VFL) across three core lenses: effectiveness (model design and feature/client selection), security (privacy leakage and malicious attacks with defenses), and applicability (data scarcity, communication constraints, and asynchrony). It consolidates recent methods, benchmarks, and practical considerations, offering a unified view of the field and proposing future research directions, including open datasets and foundation-model integration. The work highlights the need to balance privacy, performance, and efficiency while enabling cross-domain collaboration with minimal raw-data exposure. Overall, this survey aims to accelerate practical adoption of VFL by guiding researchers and practitioners toward cohesive, secure, and scalable solutions.

Abstract

Vertical Federated Learning (VFL) is a privacy-preserving distributed learning paradigm where different parties collaboratively learn models using partitioned features of shared samples, without leaking private data. Recent research has shown promising results addressing various challenges in VFL, highlighting its potential for practical applications in cross-domain collaboration. However, the corresponding research is scattered and lacks organization. To advance VFL research, this survey offers a systematic overview of recent developments. First, we provide a history and background introduction, along with a summary of the general training protocol of VFL. We then revisit the taxonomy in recent reviews and analyze limitations in-depth. For a comprehensive and structured discussion, we synthesize recent research from three fundamental perspectives: effectiveness, security, and applicability. Finally, we discuss several critical future research directions in VFL, which will facilitate the developments in this field. We provide a collection of research lists and periodically update them at https://github.com/shentt67/VFL_Survey.

Vertical Federated Learning for Effectiveness, Security, Applicability: A Survey

TL;DR

This survey provides a systematic taxonomy of Vertical Federated Learning (VFL) across three core lenses: effectiveness (model design and feature/client selection), security (privacy leakage and malicious attacks with defenses), and applicability (data scarcity, communication constraints, and asynchrony). It consolidates recent methods, benchmarks, and practical considerations, offering a unified view of the field and proposing future research directions, including open datasets and foundation-model integration. The work highlights the need to balance privacy, performance, and efficiency while enabling cross-domain collaboration with minimal raw-data exposure. Overall, this survey aims to accelerate practical adoption of VFL by guiding researchers and practitioners toward cohesive, secure, and scalable solutions.

Abstract

Vertical Federated Learning (VFL) is a privacy-preserving distributed learning paradigm where different parties collaboratively learn models using partitioned features of shared samples, without leaking private data. Recent research has shown promising results addressing various challenges in VFL, highlighting its potential for practical applications in cross-domain collaboration. However, the corresponding research is scattered and lacks organization. To advance VFL research, this survey offers a systematic overview of recent developments. First, we provide a history and background introduction, along with a summary of the general training protocol of VFL. We then revisit the taxonomy in recent reviews and analyze limitations in-depth. For a comprehensive and structured discussion, we synthesize recent research from three fundamental perspectives: effectiveness, security, and applicability. Finally, we discuss several critical future research directions in VFL, which will facilitate the developments in this field. We provide a collection of research lists and periodically update them at https://github.com/shentt67/VFL_Survey.
Paper Structure (25 sections, 3 equations, 8 figures, 10 tables)

This paper contains 25 sections, 3 equations, 8 figures, 10 tables.

Figures (8)

  • Figure 1: A Practical Application of Vertical Federated Learning. We present a practical cross-domain collaboration with three participants: mall, video platform, and bank. The mall acts as the active client, collaborating with the video platform and the bank as passive clients. Each client holds the local features and models of the same users. The active client holds the task labels, e.g., whether to buy the tie. A global model is introduced to make the final prediction of the shared/aligned users by aggregating feature embeddings from all clients. With prediction results and labels, the gradients can be calculated for both global and local model updation. Besides, a third-party coordinator can be employed for secure communication and sample alignment.
  • Figure 3: The general flow of training and testing in VFL. (a) During training, aligned sample embeddings are sent to the active client, where gradients are calculated based on task labels. The overall objective is to optimize for collaborative prediction. These gradients are then sent back to each client for model updating. (b) During testing, predictions on aligned samples are made utilizing the trained global and local models.
  • Figure 4: The illustration of Tree-based/Neural Network-based model. (a) For the tree-based model in VFL, each client trains a part of the tree model using partitioned features. These partial trees are then combined to construct the global tree model. (b) For the Neural Network-based model, each client trains its local model to extract embeddings. The global model is then trained in the active client using embeddings from all clients.
  • Figure 5: The illustration of Feature & Client Selection. (a) Feature selection aims to identify crucial features for collaborative training across all clients. (b) Client selection involves choosing essential clients for collaboration.
  • Figure 6: Inference Attack. (a) The attacker can infer the raw features using gradients and intermediate embeddings. (b) The attacker can infer the task labels using the trained local model and the gradients.
  • ...and 3 more figures