Table of Contents
Fetching ...

LabObf: A Label Protection Scheme for Vertical Federated Learning Through Label Obfuscation

Ying He, Mingyang Niu, Jingyu Hua, Yunlong Mao, Xu Huang, Chen Li, Sheng Zhong

TL;DR

This paper addresses label privacy in vertical federated learning (VFL) using SplitNN, where embeddings at the cut layer may reveal true labels. It introduces an Embedding Extension Attack to demonstrate vulnerabilities in existing defenses like Discorloss, and then proposes LabObf, a label obfuscation scheme that maps each original class to multiple soft labels via cross-referenced attributes controlled by the host. LabObf trains the main task on soft labels, reducing the correlation between cut-layer embeddings and true labels while preserving accuracy, and experiments across four datasets show substantial reductions in attack success with manageable performance costs. The work advances practical label privacy in two-party VFL by highlighting a robust defense that is difficult for attackers to circumvent and provides empirical evidence of its effectiveness and overheads.

Abstract

Split Neural Network, as one of the most common architectures used in vertical federated learning, is popular in industry due to its privacy-preserving characteristics. In this architecture, the party holding the labels seeks cooperation from other parties to improve model performance due to insufficient feature data. Each of these participants has a self-defined bottom model to learn hidden representations from its own feature data and uploads the embedding vectors to the top model held by the label holder for final predictions. This design allows participants to conduct joint training without directly exchanging data. However, existing research points out that malicious participants may still infer label information from the uploaded embeddings, leading to privacy leakage. In this paper, we first propose an embedding extension attack manipulating embeddings to undermine existing defense strategies, which rely on constraining the correlation between the embeddings uploaded by participants and the labels. Subsequently, we propose a new label obfuscation defense strategy, called `LabObf', which randomly maps each original integer-valued label to multiple real-valued soft labels with values intertwined, significantly increasing the difficulty for attackers to infer the labels. We conduct experiments on four different types of datasets, and the results show that LabObf significantly reduces the attacker's success rate compared to raw models while maintaining desirable model accuracy.

LabObf: A Label Protection Scheme for Vertical Federated Learning Through Label Obfuscation

TL;DR

This paper addresses label privacy in vertical federated learning (VFL) using SplitNN, where embeddings at the cut layer may reveal true labels. It introduces an Embedding Extension Attack to demonstrate vulnerabilities in existing defenses like Discorloss, and then proposes LabObf, a label obfuscation scheme that maps each original class to multiple soft labels via cross-referenced attributes controlled by the host. LabObf trains the main task on soft labels, reducing the correlation between cut-layer embeddings and true labels while preserving accuracy, and experiments across four datasets show substantial reductions in attack success with manageable performance costs. The work advances practical label privacy in two-party VFL by highlighting a robust defense that is difficult for attackers to circumvent and provides empirical evidence of its effectiveness and overheads.

Abstract

Split Neural Network, as one of the most common architectures used in vertical federated learning, is popular in industry due to its privacy-preserving characteristics. In this architecture, the party holding the labels seeks cooperation from other parties to improve model performance due to insufficient feature data. Each of these participants has a self-defined bottom model to learn hidden representations from its own feature data and uploads the embedding vectors to the top model held by the label holder for final predictions. This design allows participants to conduct joint training without directly exchanging data. However, existing research points out that malicious participants may still infer label information from the uploaded embeddings, leading to privacy leakage. In this paper, we first propose an embedding extension attack manipulating embeddings to undermine existing defense strategies, which rely on constraining the correlation between the embeddings uploaded by participants and the labels. Subsequently, we propose a new label obfuscation defense strategy, called `LabObf', which randomly maps each original integer-valued label to multiple real-valued soft labels with values intertwined, significantly increasing the difficulty for attackers to infer the labels. We conduct experiments on four different types of datasets, and the results show that LabObf significantly reduces the attacker's success rate compared to raw models while maintaining desirable model accuracy.
Paper Structure (20 sections, 5 equations, 12 figures, 6 tables, 2 algorithms)

This paper contains 20 sections, 5 equations, 12 figures, 6 tables, 2 algorithms.

Figures (12)

  • Figure 1: The architecture of SplitNN-based VFL
  • Figure 2: The workflow of the embedding extension attack. There is a perturbation generative model on the client side which takes the original embedding as input and generates the perturbation dimensions. The client appends the perturbation dimensions to the original embedding and then uploads the modified embedding to the top model.
  • Figure 3: The Pearson correlation between the embedding dimensions and the real labels, where $P01$ to $P04$ represent the additional perturbation dimensions.
  • Figure 4: The key idea of LabObf is to map the original class distribution to a new scrambled distribution.
  • Figure 5: The workflow of LabObf. The host uses soft labels to train the VFL model. During the prediction phase, the top model generates soft labels, and the host translates them into real labels.
  • ...and 7 more figures