Table of Contents
Fetching ...

Privacy-Aware Time Series Synthesis via Public Knowledge Distillation

Penghang Liu, Haibei Zhu, Eleonora Kreacic, Svitlana Vyetrenko

TL;DR

This work tackles privacy-preserving synthesis of sensitive multivariate time series by leveraging heterogeneous public knowledge. It introduces Pub2Priv, a conditional diffusion framework powered by a knowledge transformer that encodes public metadata into conditioning embeddings, trained under DP-SGD to guarantee $(\varepsilon,\delta)$-DP. A practical identifiability metric is proposed to assess empirical privacy leakage beyond theoretical DP guarantees. Across finance, energy, and commodity domains, Pub2Priv delivers improved privacy-utility trade-offs over DP-based baselines and provides insights into how public-private correlations influence downstream utility.

Abstract

Sharing sensitive time series data in domains such as finance, healthcare, and energy consumption, such as patient records or investment accounts, is often restricted due to privacy concerns. Privacy-aware synthetic time series generation addresses this challenge by enforcing noise during training, inherently introducing a trade-off between privacy and utility. In many cases, sensitive sequences is correlated with publicly available, non-sensitive contextual metadata (e.g., household electricity consumption may be influenced by weather conditions and electricity prices). However, existing privacy-aware data generation methods often overlook this opportunity, resulting in suboptimal privacy-utility trade-offs. In this paper, we present Pub2Priv, a novel framework for generating private time series data by leveraging heterogeneous public knowledge. Our model employs a self-attention mechanism to encode public data into temporal and feature embeddings, which serve as conditional inputs for a diffusion model to generate synthetic private sequences. Additionally, we introduce a practical metric to assess privacy by evaluating the identifiability of the synthetic data. Experimental results show that Pub2Priv consistently outperforms state-of-the-art benchmarks in improving the privacy-utility trade-off across finance, energy, and commodity trading domains.

Privacy-Aware Time Series Synthesis via Public Knowledge Distillation

TL;DR

This work tackles privacy-preserving synthesis of sensitive multivariate time series by leveraging heterogeneous public knowledge. It introduces Pub2Priv, a conditional diffusion framework powered by a knowledge transformer that encodes public metadata into conditioning embeddings, trained under DP-SGD to guarantee -DP. A practical identifiability metric is proposed to assess empirical privacy leakage beyond theoretical DP guarantees. Across finance, energy, and commodity domains, Pub2Priv delivers improved privacy-utility trade-offs over DP-based baselines and provides insights into how public-private correlations influence downstream utility.

Abstract

Sharing sensitive time series data in domains such as finance, healthcare, and energy consumption, such as patient records or investment accounts, is often restricted due to privacy concerns. Privacy-aware synthetic time series generation addresses this challenge by enforcing noise during training, inherently introducing a trade-off between privacy and utility. In many cases, sensitive sequences is correlated with publicly available, non-sensitive contextual metadata (e.g., household electricity consumption may be influenced by weather conditions and electricity prices). However, existing privacy-aware data generation methods often overlook this opportunity, resulting in suboptimal privacy-utility trade-offs. In this paper, we present Pub2Priv, a novel framework for generating private time series data by leveraging heterogeneous public knowledge. Our model employs a self-attention mechanism to encode public data into temporal and feature embeddings, which serve as conditional inputs for a diffusion model to generate synthetic private sequences. Additionally, we introduce a practical metric to assess privacy by evaluating the identifiability of the synthetic data. Experimental results show that Pub2Priv consistently outperforms state-of-the-art benchmarks in improving the privacy-utility trade-off across finance, energy, and commodity trading domains.

Paper Structure

This paper contains 51 sections, 2 theorems, 26 equations, 9 figures, 7 tables, 1 algorithm.

Key Result

Theorem B.1

mironov2017renyi For a query function $f$ with Sensitivity $S = \max_{d, d'} \|f(d) - f(d')\|_2$, the Gaussian mechanism that releases $f(d) + \mathcal{N}(0, \sigma^2)$ satisfies $(\alpha, \alpha S^2 / (2\sigma^2))$-RDP.

Figures (9)

  • Figure 1: Pub2Priv generates private time series from heterogeneous public knowledge. The model generates realistic electricity consumption based on non-secret temperature and electricity price information. Household private data is protected by DP-SGD during training.
  • Figure 2: Pub2Priv architecture. Given the original data sample $x_0$, we gradually add noise through forward process $q(x_t|x_{t-1})$. In the reverse process, we use self-attention layers $\theta_{\mathrm{T}}$ to create temporal and feature embedding of the metadata $c$, which is passed to the conditional denoiser $\theta_{\mathrm{DM}}$ to reconstruct the original sample.
  • Figure 3: The utility-privacy trade-off for Pub2Priv and the benchmark models on the portfolio dataset.
  • Figure 4: t-SNE visualizations of synthetic data generated by Pub2Priv and the baseline models based on $\varepsilon=1, \delta=1\times10^{-5}$, where the top row shows portfolio dataset and the bottom are electricity dataset. We omit the t-SNE plots for the Comtrade dataset due to its relatively small size, which results in sparse visualizations.
  • Figure 5: The identifiability $\mathcal{I}(D, D')$ of synthetic data generated by Pub2Pub and the benchmark models given different $\varepsilon$ (with $\delta=1\times10^{-5}$).
  • ...and 4 more figures

Theorems & Definitions (3)

  • Definition 3.1: $\varepsilon, \delta-$Differential Privacy
  • Theorem B.1
  • Theorem B.2