Table of Contents
Fetching ...

Vertical Federated Learning: Concepts, Advances and Challenges

Yang Liu, Yan Kang, Tianyuan Zou, Yanhong Pu, Yuanqin He, Xiaozhou Ye, Ye Ouyang, Ya-Qin Zhang, Qiang Yang

TL;DR

This work surveys Vertical Federated Learning (VFL), a privacy-preserving paradigm where parties hold disjoint feature sets for the same users. It analyzes the VFL framework, problem formulation, and training protocols, and catalogs advances in efficiency, effectiveness, and privacy defenses, culminating in the VFLow optimization framework that jointly considers privacy, computation, communication, and fairness. The paper surveys attacks and defenses (cryptographic and non-cryptographic), data valuation, explainability, and fairness, and highlights industrial applications across finance, healthcare, and advertising, while outlining open challenges such as interoperability and trustworthy deployment. By providing a unified taxonomy and framework, the work guides future research toward robust, scalable, and auditable VFL systems with practical impact.

Abstract

Vertical Federated Learning (VFL) is a federated learning setting where multiple parties with different features about the same set of users jointly train machine learning models without exposing their raw data or model parameters. Motivated by the rapid growth in VFL research and real-world applications, we provide a comprehensive review of the concept and algorithms of VFL, as well as current advances and challenges in various aspects, including effectiveness, efficiency, and privacy. We provide an exhaustive categorization for VFL settings and privacy-preserving protocols and comprehensively analyze the privacy attacks and defense strategies for each protocol. In the end, we propose a unified framework, termed VFLow, which considers the VFL problem under communication, computation, privacy, as well as effectiveness and fairness constraints. Finally, we review the most recent advances in industrial applications, highlighting open challenges and future directions for VFL.

Vertical Federated Learning: Concepts, Advances and Challenges

TL;DR

This work surveys Vertical Federated Learning (VFL), a privacy-preserving paradigm where parties hold disjoint feature sets for the same users. It analyzes the VFL framework, problem formulation, and training protocols, and catalogs advances in efficiency, effectiveness, and privacy defenses, culminating in the VFLow optimization framework that jointly considers privacy, computation, communication, and fairness. The paper surveys attacks and defenses (cryptographic and non-cryptographic), data valuation, explainability, and fairness, and highlights industrial applications across finance, healthcare, and advertising, while outlining open challenges such as interoperability and trustworthy deployment. By providing a unified taxonomy and framework, the work guides future research toward robust, scalable, and auditable VFL systems with practical impact.

Abstract

Vertical Federated Learning (VFL) is a federated learning setting where multiple parties with different features about the same set of users jointly train machine learning models without exposing their raw data or model parameters. Motivated by the rapid growth in VFL research and real-world applications, we provide a comprehensive review of the concept and algorithms of VFL, as well as current advances and challenges in various aspects, including effectiveness, efficiency, and privacy. We provide an exhaustive categorization for VFL settings and privacy-preserving protocols and comprehensively analyze the privacy attacks and defense strategies for each protocol. In the end, we propose a unified framework, termed VFLow, which considers the VFL problem under communication, computation, privacy, as well as effectiveness and fairness constraints. Finally, we review the most recent advances in industrial applications, highlighting open challenges and future directions for VFL.
Paper Structure (37 sections, 14 equations, 9 figures, 10 tables, 1 algorithm)

This paper contains 37 sections, 14 equations, 9 figures, 10 tables, 1 algorithm.

Figures (9)

  • Figure 1: Three categories of Federated Learning
  • Figure 2: Relationships between sections in this work.
  • Figure 3: Illustration of the VFL system with three parties (two passive parties and one active party). $\mathcal{G}_1, \mathcal{G}_2$, and $\mathcal{G}_3$ denote the local models of the three parties, respectively, and $\mathcal{F}_3$ denotes the global module owned by the active party. The VFL training protocol typically involves two steps: 1) the three parties align their samples via private entity alignment; 2) the three parties collaboratively train $\mathcal{G}_1, \mathcal{G}_2, \mathcal{G}_3$ and $\mathcal{F}_3$ in a privacy-preserving manner (see Section \ref{['sec:vfl_training']} for details).
  • Figure 4: Four major variants of VFL illustrated with one active party and two passive parties.
  • Figure 5: The virtual dataset of a two-party VFL. $\mathcal{D}$ denotes the labeled and aligned samples used by the conventional VFL formulated in Eq. (\ref{['eq:problem']}), whereas $\mathcal{D}^{au}$ denotes aligned but unlabeled samples. $\mathcal{D}_A^{uu}$ and $\mathcal{D}_B^{uu}$ denote unaligned and unlabeled samples of party A and party B, respectively. $\mathcal{D}_A^{ul}$ denotes unaligned and labeled samples of party A.
  • ...and 4 more figures