Table of Contents
Fetching ...

A Survey of Privacy Threats and Defense in Vertical Federated Learning: From Model Life Cycle Perspective

Lei Yu, Meng Han, Yiming Li, Changting Lin, Yao Zhang, Mingyang Zhang, Yan Liu, Haiqin Weng, Yuseok Jeon, Ka-Ho Chow, Stacy Patterson

TL;DR

This survey addresses privacy threats in Vertical Federated Learning through a life-cycle lens, delivering a taxonomy of attacks (label/feature inference and model extraction) and defenses (cryptographic and non-cryptographic) across data preprocessing, training, deployment, and inference. It distinguishes VFL-specific privacy dynamics from HFL, emphasizes insider adversaries, and surveys a broad range of techniques including HE, MPC, FE, DP, and adversarial training. The work highlights critical gaps in multi-party VFL, tree-model privacy, and end-to-end privacy, and advocates for hybrid, efficiency-conscious defenses and privacy auditing. Collectively, the findings provide practitioners with actionable guidance to safeguard privacy while enabling practical VFL deployments in regulated domains like healthcare and finance.

Abstract

Vertical Federated Learning (VFL) is a federated learning paradigm where multiple participants, who share the same set of samples but hold different features, jointly train machine learning models. Although VFL enables collaborative machine learning without sharing raw data, it is still susceptible to various privacy threats. In this paper, we conduct the first comprehensive survey of the state-of-the-art in privacy attacks and defenses in VFL. We provide taxonomies for both attacks and defenses, based on their characterizations, and discuss open challenges and future research directions. Specifically, our discussion is structured around the model's life cycle, by delving into the privacy threats encountered during different stages of machine learning and their corresponding countermeasures. This survey not only serves as a resource for the research community but also offers clear guidance and actionable insights for practitioners to safeguard data privacy throughout the model's life cycle.

A Survey of Privacy Threats and Defense in Vertical Federated Learning: From Model Life Cycle Perspective

TL;DR

This survey addresses privacy threats in Vertical Federated Learning through a life-cycle lens, delivering a taxonomy of attacks (label/feature inference and model extraction) and defenses (cryptographic and non-cryptographic) across data preprocessing, training, deployment, and inference. It distinguishes VFL-specific privacy dynamics from HFL, emphasizes insider adversaries, and surveys a broad range of techniques including HE, MPC, FE, DP, and adversarial training. The work highlights critical gaps in multi-party VFL, tree-model privacy, and end-to-end privacy, and advocates for hybrid, efficiency-conscious defenses and privacy auditing. Collectively, the findings provide practitioners with actionable guidance to safeguard privacy while enabling practical VFL deployments in regulated domains like healthcare and finance.

Abstract

Vertical Federated Learning (VFL) is a federated learning paradigm where multiple participants, who share the same set of samples but hold different features, jointly train machine learning models. Although VFL enables collaborative machine learning without sharing raw data, it is still susceptible to various privacy threats. In this paper, we conduct the first comprehensive survey of the state-of-the-art in privacy attacks and defenses in VFL. We provide taxonomies for both attacks and defenses, based on their characterizations, and discuss open challenges and future research directions. Specifically, our discussion is structured around the model's life cycle, by delving into the privacy threats encountered during different stages of machine learning and their corresponding countermeasures. This survey not only serves as a resource for the research community but also offers clear guidance and actionable insights for practitioners to safeguard data privacy throughout the model's life cycle.
Paper Structure (55 sections, 4 equations, 3 figures, 8 tables)

This paper contains 55 sections, 4 equations, 3 figures, 8 tables.

Figures (3)

  • Figure 1: Different Phases of Machine Learning Life-cycle
  • Figure 2: Entity alignment in Data Processing phase.
  • Figure 3: A is an active party, B is a passive party, and C is a coordinator. step①:sending public keys; step②:sending intermediate results; step③:sending loss;step④:sending encrypted gradients;step⑤:sending decrypted gradients