Table of Contents
Fetching ...

Federated Transformer: Multi-Party Vertical Federated Learning on Practical Fuzzily Linked Data

Zhaomin Wu, Junyi Hou, Yiqun Diao, Bingsheng He

TL;DR

The Federated Transformer (FeT), a novel framework that supports multi-party VFL with fuzzy identifiers and innovatively encodes these identifiers into data representations and employs a transformer architecture distributed across different parties, incorporating three new techniques to enhance performance.

Abstract

Federated Learning (FL) is an evolving paradigm that enables multiple parties to collaboratively train models without sharing raw data. Among its variants, Vertical Federated Learning (VFL) is particularly relevant in real-world, cross-organizational collaborations, where distinct features of a shared instance group are contributed by different parties. In these scenarios, parties are often linked using fuzzy identifiers, leading to a common practice termed as multi-party fuzzy VFL. Existing models generally address either multi-party VFL or fuzzy VFL between two parties. Extending these models to practical multi-party fuzzy VFL typically results in significant performance degradation and increased costs for maintaining privacy. To overcome these limitations, we introduce the Federated Transformer (FeT), a novel framework that supports multi-party VFL with fuzzy identifiers. FeT innovatively encodes these identifiers into data representations and employs a transformer architecture distributed across different parties, incorporating three new techniques to enhance performance. Furthermore, we have developed a multi-party privacy framework for VFL that integrates differential privacy with secure multi-party computation, effectively protecting local representations while minimizing associated utility costs. Our experiments demonstrate that the FeT surpasses the baseline models by up to 46\% in terms of accuracy when scaled to 50 parties. Additionally, in two-party fuzzy VFL settings, FeT also shows improved performance and privacy over cutting-edge VFL models.

Federated Transformer: Multi-Party Vertical Federated Learning on Practical Fuzzily Linked Data

TL;DR

The Federated Transformer (FeT), a novel framework that supports multi-party VFL with fuzzy identifiers and innovatively encodes these identifiers into data representations and employs a transformer architecture distributed across different parties, incorporating three new techniques to enhance performance.

Abstract

Federated Learning (FL) is an evolving paradigm that enables multiple parties to collaboratively train models without sharing raw data. Among its variants, Vertical Federated Learning (VFL) is particularly relevant in real-world, cross-organizational collaborations, where distinct features of a shared instance group are contributed by different parties. In these scenarios, parties are often linked using fuzzy identifiers, leading to a common practice termed as multi-party fuzzy VFL. Existing models generally address either multi-party VFL or fuzzy VFL between two parties. Extending these models to practical multi-party fuzzy VFL typically results in significant performance degradation and increased costs for maintaining privacy. To overcome these limitations, we introduce the Federated Transformer (FeT), a novel framework that supports multi-party VFL with fuzzy identifiers. FeT innovatively encodes these identifiers into data representations and employs a transformer architecture distributed across different parties, incorporating three new techniques to enhance performance. Furthermore, we have developed a multi-party privacy framework for VFL that integrates differential privacy with secure multi-party computation, effectively protecting local representations while minimizing associated utility costs. Our experiments demonstrate that the FeT surpasses the baseline models by up to 46\% in terms of accuracy when scaled to 50 parties. Additionally, in two-party fuzzy VFL settings, FeT also shows improved performance and privacy over cutting-edge VFL models.

Paper Structure

This paper contains 46 sections, 4 theorems, 5 equations, 15 figures, 9 tables, 1 algorithm.

Key Result

Theorem 1

For a function $f:\mathbb{X}\rightarrow\mathbb{R}^d$ characterized by a global $L_2$ sensitivity $\Delta_2$, which signifies that the maximum difference in the $L_2$-norm of the outputs of $f$ on any two neighboring databases is $\Delta_2$, and for any $\varepsilon \ge 0$ and $\delta \in [0,1]$, the

Figures (15)

  • Figure 1: Real application of multi-party fuzzy VFL: travel cost prediction in a city
  • Figure 2: Structure of federated transformer (PE: multi-dimensional positional encoding)
  • Figure 3: Learned dynamic masks of different samples: Each figure displays one sample (red star) from the primary party fuzzily linked with 4900 samples (circles) from 49 secondary parties. The position indicates the sample's identifier, and colors reflect learned dynamic mask values. Larger mask values signify higher importance in attention layers.
  • Figure 4: Misalignment of positional encoding ($P_0$: primary party; $P_1\sim P_3$: secondary parties)
  • Figure 5: Differentially private split-sum neural network
  • ...and 10 more figures

Theorems & Definitions (6)

  • Definition 1
  • Theorem 1: Gaussian Mechanism balle2018improving
  • Theorem 2: Moments Accountant abadi2016deep
  • Theorem 3
  • Theorem 3
  • proof