Table of Contents
Fetching ...

VFLGAN-TS: Vertical Federated Learning-based Generative Adversarial Networks for Publication of Vertically Partitioned Time-Series Data

Xun Yuan, Zilong Zhao, Prosanta Gope, Biplab Sikdar

TL;DR

VFLGAN-TS, which combines the ideas of attribute discriminator and vertical federated learning to generate synthetic time-series data in the vertically partitioned scenario, is proposed and an enhanced privacy auditing scheme is developed to evaluate the potential privacy breach through the framework of VFLGAN-TS and synthetic datasets.

Abstract

In the current artificial intelligence (AI) era, the scale and quality of the dataset play a crucial role in training a high-quality AI model. However, often original data cannot be shared due to privacy concerns and regulations. A potential solution is to release a synthetic dataset with a similar distribution to the private dataset. Nevertheless, in some scenarios, the attributes required to train an AI model are distributed among different parties, and the parties cannot share the local data for synthetic data construction due to privacy regulations. In PETS 2024, we recently introduced the first Vertical Federated Learning-based Generative Adversarial Network (VFLGAN) for publishing vertically partitioned static data. However, VFLGAN cannot effectively handle time-series data, presenting both temporal and attribute dimensions. In this article, we proposed VFLGAN-TS, which combines the ideas of attribute discriminator and vertical federated learning to generate synthetic time-series data in the vertically partitioned scenario. The performance of VFLGAN-TS is close to that of its counterpart, which is trained in a centralized manner and represents the upper limit for VFLGAN-TS. To further protect privacy, we apply a Gaussian mechanism to make VFLGAN-TS satisfy an $(ε,δ)$-differential privacy. Besides, we develop an enhanced privacy auditing scheme to evaluate the potential privacy breach through the framework of VFLGAN-TS and synthetic datasets.

VFLGAN-TS: Vertical Federated Learning-based Generative Adversarial Networks for Publication of Vertically Partitioned Time-Series Data

TL;DR

VFLGAN-TS, which combines the ideas of attribute discriminator and vertical federated learning to generate synthetic time-series data in the vertically partitioned scenario, is proposed and an enhanced privacy auditing scheme is developed to evaluate the potential privacy breach through the framework of VFLGAN-TS and synthetic datasets.

Abstract

In the current artificial intelligence (AI) era, the scale and quality of the dataset play a crucial role in training a high-quality AI model. However, often original data cannot be shared due to privacy concerns and regulations. A potential solution is to release a synthetic dataset with a similar distribution to the private dataset. Nevertheless, in some scenarios, the attributes required to train an AI model are distributed among different parties, and the parties cannot share the local data for synthetic data construction due to privacy regulations. In PETS 2024, we recently introduced the first Vertical Federated Learning-based Generative Adversarial Network (VFLGAN) for publishing vertically partitioned static data. However, VFLGAN cannot effectively handle time-series data, presenting both temporal and attribute dimensions. In this article, we proposed VFLGAN-TS, which combines the ideas of attribute discriminator and vertical federated learning to generate synthetic time-series data in the vertically partitioned scenario. The performance of VFLGAN-TS is close to that of its counterpart, which is trained in a centralized manner and represents the upper limit for VFLGAN-TS. To further protect privacy, we apply a Gaussian mechanism to make VFLGAN-TS satisfy an -differential privacy. Besides, we develop an enhanced privacy auditing scheme to evaluate the potential privacy breach through the framework of VFLGAN-TS and synthetic datasets.
Paper Structure (30 sections, 7 theorems, 31 equations, 7 figures, 6 tables, 3 algorithms)

This paper contains 30 sections, 7 theorems, 31 equations, 7 figures, 6 tables, 3 algorithms.

Key Result

Proposition 1

(Composition of RDP) Let $f : D \rightarrow R_1$ be $(\alpha, \epsilon_1)$-RDP and $g : R_1 \times D \rightarrow R_2$ be $(\alpha,\epsilon_2)$-RDP. Then the mechanism defined as $(X, Y)$, where $X \sim f(D)$ and $Y \sim g(X, D)$, satisfies $(\alpha, \epsilon_1 + \epsilon_2)$-RDP.

Figures (7)

  • Figure 1: Framework of the Proposed VFLGAN-TS.
  • Figure 2: Wasserstein distance curves during training different methods on Sine Datasets.
  • Figure 3: The histograms in each sub-figure show the amplitude distribution of each attribute. The spots in each sub-figure represent a sample's amplitudes of both attributes.
  • Figure 4: Wasserstein distance curves during training different methods on EEG Dataset.
  • Figure 5: Visualization of similarity between real and synthetic datasets using PCA and t-SNE. The left two columns are results for EEG 0 and the right two columns are results for EEG 1.
  • ...and 2 more figures

Theorems & Definitions (14)

  • Definition 1
  • Definition 2
  • Definition 3
  • Proposition 1
  • Proposition 2
  • Proposition 3
  • Theorem 1
  • proof
  • Proposition 4
  • proof
  • ...and 4 more