Table of Contents
Fetching ...

Towards Privacy-Guaranteed Label Unlearning in Vertical Federated Learning: Few-Shot Forgetting without Disclosure

Hanlin Gu, Hong Xi Tae, Chee Seng Chan, Lixin Fan

TL;DR

The paper addresses the problem of removing sensitive labels in Vertical Federated Learning by introducing a few-shot label unlearning framework that leverages representation-level manifold mixup, gradient-based forgetting, and a recovery phase to preserve performance on retained data. The approach enables efficient unlearning using a small public dataset and proves that augmented gradient directions align with full-data unlearning, supported by theoretical insights and extensive experiments. It demonstrates strong utility preservation, effective unlearning of targeted labels, and superior time efficiency across diverse datasets and modalities, while also introducing a formal notion of process privacy for VFL unlearning. The work provides a practical and scalable solution for privacy-preserving collaboration in VFL and opens avenues for further refinement of privacy guarantees and asynchronous deployments.

Abstract

This paper addresses the critical challenge of unlearning in Vertical Federated Learning (VFL), a setting that has received far less attention than its horizontal counterpart. Specifically, we propose the first method tailored to \textit{label unlearning} in VFL, where labels play a dual role as both essential inputs and sensitive information. To this end, we employ a representation-level manifold mixup mechanism to generate synthetic embeddings for both unlearned and retained samples. This is to provide richer signals for the subsequent gradient-based label forgetting and recovery steps. These augmented embeddings are then subjected to gradient-based label forgetting, effectively removing the associated label information from the model. To recover performance on the retained data, we introduce a recovery-phase optimization step that refines the remaining embeddings. This design achieves effective label unlearning while maintaining computational efficiency. We validate our method through extensive experiments on diverse datasets, including MNIST, CIFAR-10, CIFAR-100, ModelNet, Brain Tumor MRI, COVID-19 Radiography, and Yahoo Answers demonstrate strong efficacy and scalability. Overall, this work establishes a new direction for unlearning in VFL, showing that re-imagining mixup as an efficient mechanism can unlock practical and utility-preserving unlearning. The code is publicly available at \href{https://github.com/bryanhx/Towards-Privacy-Guaranteed-Label-Unlearning-in-Vertical-Federated-Learning}{https://github.com/bryanhx/Towards-Privacy-Guaranteed-Label-Unlearning-in-Vertical-Federated-Learning}

Towards Privacy-Guaranteed Label Unlearning in Vertical Federated Learning: Few-Shot Forgetting without Disclosure

TL;DR

The paper addresses the problem of removing sensitive labels in Vertical Federated Learning by introducing a few-shot label unlearning framework that leverages representation-level manifold mixup, gradient-based forgetting, and a recovery phase to preserve performance on retained data. The approach enables efficient unlearning using a small public dataset and proves that augmented gradient directions align with full-data unlearning, supported by theoretical insights and extensive experiments. It demonstrates strong utility preservation, effective unlearning of targeted labels, and superior time efficiency across diverse datasets and modalities, while also introducing a formal notion of process privacy for VFL unlearning. The work provides a practical and scalable solution for privacy-preserving collaboration in VFL and opens avenues for further refinement of privacy guarantees and asynchronous deployments.

Abstract

This paper addresses the critical challenge of unlearning in Vertical Federated Learning (VFL), a setting that has received far less attention than its horizontal counterpart. Specifically, we propose the first method tailored to \textit{label unlearning} in VFL, where labels play a dual role as both essential inputs and sensitive information. To this end, we employ a representation-level manifold mixup mechanism to generate synthetic embeddings for both unlearned and retained samples. This is to provide richer signals for the subsequent gradient-based label forgetting and recovery steps. These augmented embeddings are then subjected to gradient-based label forgetting, effectively removing the associated label information from the model. To recover performance on the retained data, we introduce a recovery-phase optimization step that refines the remaining embeddings. This design achieves effective label unlearning while maintaining computational efficiency. We validate our method through extensive experiments on diverse datasets, including MNIST, CIFAR-10, CIFAR-100, ModelNet, Brain Tumor MRI, COVID-19 Radiography, and Yahoo Answers demonstrate strong efficacy and scalability. Overall, this work establishes a new direction for unlearning in VFL, showing that re-imagining mixup as an efficient mechanism can unlock practical and utility-preserving unlearning. The code is publicly available at \href{https://github.com/bryanhx/Towards-Privacy-Guaranteed-Label-Unlearning-in-Vertical-Federated-Learning}{https://github.com/bryanhx/Towards-Privacy-Guaranteed-Label-Unlearning-in-Vertical-Federated-Learning}

Paper Structure

This paper contains 37 sections, 2 theorems, 20 equations, 13 figures, 8 tables, 1 algorithm.

Key Result

Theorem 1

Suppose that both the trained passive model $\theta$ and the active model $\omega$ achieve a training loss bounded by a small value $\epsilon$. Then, when unlearning a single label, the following holds: where $(\vec{H}^u, \vec{y} ^u)$ denotes the manifold mixup embeddings and labels of the public data $\mathcal{D}^{p,u}$ associated with the unlearned label, $({H}^u, y^u)$ denotes the embeddings

Figures (13)

  • Figure 1: Overview of our proposed few-shot unlearning framework in VFL setting.
  • Figure 2: The runtime(s) of each unlearning method in seconds.
  • Figure 3: Single-label unlearning scenario on Yahoo Answer dataset with MixText architecture.
  • Figure 4: Comparison of the utility and unlearning effectiveness on different sizes of $\mathcal{D}^{p,u}$.
  • Figure 5: Accuracy of $\mathcal{D}^r$, $y^u$ and ASR for each unlearning method across ResNet18 model in single-label unlearning on different numbers of passive parties.
  • ...and 8 more figures

Theorems & Definitions (4)

  • Theorem 1
  • Definition 1: Process Privacy
  • Theorem 2
  • proof