Towards Privacy-Guaranteed Label Unlearning in Vertical Federated Learning: Few-Shot Forgetting without Disclosure
Hanlin Gu, Hong Xi Tae, Chee Seng Chan, Lixin Fan
TL;DR
The paper addresses the problem of removing sensitive labels in Vertical Federated Learning by introducing a few-shot label unlearning framework that leverages representation-level manifold mixup, gradient-based forgetting, and a recovery phase to preserve performance on retained data. The approach enables efficient unlearning using a small public dataset and proves that augmented gradient directions align with full-data unlearning, supported by theoretical insights and extensive experiments. It demonstrates strong utility preservation, effective unlearning of targeted labels, and superior time efficiency across diverse datasets and modalities, while also introducing a formal notion of process privacy for VFL unlearning. The work provides a practical and scalable solution for privacy-preserving collaboration in VFL and opens avenues for further refinement of privacy guarantees and asynchronous deployments.
Abstract
This paper addresses the critical challenge of unlearning in Vertical Federated Learning (VFL), a setting that has received far less attention than its horizontal counterpart. Specifically, we propose the first method tailored to \textit{label unlearning} in VFL, where labels play a dual role as both essential inputs and sensitive information. To this end, we employ a representation-level manifold mixup mechanism to generate synthetic embeddings for both unlearned and retained samples. This is to provide richer signals for the subsequent gradient-based label forgetting and recovery steps. These augmented embeddings are then subjected to gradient-based label forgetting, effectively removing the associated label information from the model. To recover performance on the retained data, we introduce a recovery-phase optimization step that refines the remaining embeddings. This design achieves effective label unlearning while maintaining computational efficiency. We validate our method through extensive experiments on diverse datasets, including MNIST, CIFAR-10, CIFAR-100, ModelNet, Brain Tumor MRI, COVID-19 Radiography, and Yahoo Answers demonstrate strong efficacy and scalability. Overall, this work establishes a new direction for unlearning in VFL, showing that re-imagining mixup as an efficient mechanism can unlock practical and utility-preserving unlearning. The code is publicly available at \href{https://github.com/bryanhx/Towards-Privacy-Guaranteed-Label-Unlearning-in-Vertical-Federated-Learning}{https://github.com/bryanhx/Towards-Privacy-Guaranteed-Label-Unlearning-in-Vertical-Federated-Learning}
