Table of Contents
Fetching ...

Forgetting Any Data at Any Time: A Theoretically Certified Unlearning Framework for Vertical Federated Learning

Linian Wang, Leye Wang

TL;DR

This paper addresses the right to be forgotten in Vertical Federated Learning by introducing a unified unlearning framework (VFU) that supports forgetting any data at any time with model- and data-agnostic guarantees and introduces an asynchronous forgetting mechanism. It builds on the Aggregate VFL (AggVFL) paradigm and introduces a confidence-matrix representation that aggregates per-sample, per-class confidences, enabling diverse forgetting targets (client, feature, and sensitive information removal) to be processed as closed-form updates and backpropagated within the same framework. For models with strongly convex losses, the authors establish theoretical guarantees for $\epsilon$-certified (or $(\epsilon,\delta)$-certified) unlearning by combining Gaussian gradient-noise injection and a first-round gradient-ascent/descent update, with proofs sketched and detailed in appendices. The work also proposes the first asynchronous VFU system, leveraging the stability of the update-contribution factor to estimate offline clients’ impact and dramatically reduce coordination overhead. Extensive experiments across multiple datasets and both LR and MLP models demonstrate fidelity close to retraining and competitive efficiency, highlighting practical applicability for RTBF-compliant VFL in privacy-sensitive domains.

Abstract

Privacy concerns in machine learning are heightened by regulations such as the GDPR, which enforces the "right to be forgotten" (RTBF), driving the emergence of machine unlearning as a critical research field. Vertical Federated Learning (VFL) enables collaborative model training by aggregating a sample's features across distributed parties while preserving data privacy at each source. This paradigm has seen widespread adoption in healthcare, finance, and other privacy-sensitive domains. However, existing VFL systems lack robust mechanisms to comply with RTBF requirements, as unlearning methodologies for VFL remain underexplored. In this work, we introduce the first VFL framework with theoretically guaranteed unlearning capabilities, enabling the removal of any data at any time. Unlike prior approaches -- which impose restrictive assumptions on model architectures or data types for removal -- our solution is model- and data-agnostic, offering universal compatibility. Moreover, our framework supports asynchronous unlearning, eliminating the need for all parties to be simultaneously online during the forgetting process. These advancements address critical gaps in current VFL systems, ensuring compliance with RTBF while maintaining operational flexibility.We make all our implementations publicly available at https://github.com/wangln19/vertical-federated-unlearning.

Forgetting Any Data at Any Time: A Theoretically Certified Unlearning Framework for Vertical Federated Learning

TL;DR

This paper addresses the right to be forgotten in Vertical Federated Learning by introducing a unified unlearning framework (VFU) that supports forgetting any data at any time with model- and data-agnostic guarantees and introduces an asynchronous forgetting mechanism. It builds on the Aggregate VFL (AggVFL) paradigm and introduces a confidence-matrix representation that aggregates per-sample, per-class confidences, enabling diverse forgetting targets (client, feature, and sensitive information removal) to be processed as closed-form updates and backpropagated within the same framework. For models with strongly convex losses, the authors establish theoretical guarantees for -certified (or -certified) unlearning by combining Gaussian gradient-noise injection and a first-round gradient-ascent/descent update, with proofs sketched and detailed in appendices. The work also proposes the first asynchronous VFU system, leveraging the stability of the update-contribution factor to estimate offline clients’ impact and dramatically reduce coordination overhead. Extensive experiments across multiple datasets and both LR and MLP models demonstrate fidelity close to retraining and competitive efficiency, highlighting practical applicability for RTBF-compliant VFL in privacy-sensitive domains.

Abstract

Privacy concerns in machine learning are heightened by regulations such as the GDPR, which enforces the "right to be forgotten" (RTBF), driving the emergence of machine unlearning as a critical research field. Vertical Federated Learning (VFL) enables collaborative model training by aggregating a sample's features across distributed parties while preserving data privacy at each source. This paradigm has seen widespread adoption in healthcare, finance, and other privacy-sensitive domains. However, existing VFL systems lack robust mechanisms to comply with RTBF requirements, as unlearning methodologies for VFL remain underexplored. In this work, we introduce the first VFL framework with theoretically guaranteed unlearning capabilities, enabling the removal of any data at any time. Unlike prior approaches -- which impose restrictive assumptions on model architectures or data types for removal -- our solution is model- and data-agnostic, offering universal compatibility. Moreover, our framework supports asynchronous unlearning, eliminating the need for all parties to be simultaneously online during the forgetting process. These advancements address critical gaps in current VFL systems, ensuring compliance with RTBF while maintaining operational flexibility.We make all our implementations publicly available at https://github.com/wangln19/vertical-federated-unlearning.

Paper Structure

This paper contains 19 sections, 3 theorems, 56 equations, 8 figures, 7 tables.

Key Result

Theorem 1

Assume the loss $\ell(\theta; z)$ is convex, $\gamma$-smooth with $L_2$ regularization $\frac{\lambda}{2}\|\theta\|_2^2$. For any data modification $(Z, \tilde{Z})$, our method achieves $(\epsilon, \delta)$-certified unlearning with $\delta=1.5e^{-c^2/2}$ when:

Figures (8)

  • Figure 1: What Asynchronous Unlearning Is. When Client 2 propose a feature removal request, only Client 1, 2 and K are online to unlearn.
  • Figure 2: Structure and the Unlearning Process of Our Method.
  • Figure 3: Suppose the whole matrix is the data from one client. In client removal, the whole matrix is removed. In feature removal, the feature to be removed is marked with blue. In sensitive information removal, the sensitive information to be removed is marked with red.
  • Figure 4: Feature Removal Results.
  • Figure 5: Sensitive Information Removal Results.
  • ...and 3 more figures

Theorems & Definitions (6)

  • Definition 1
  • Definition 2
  • Theorem 1: Certified Unlearning Guarantee
  • proof : Proof Sketch
  • Lemma 1
  • Lemma 2