Privacy-Aware Time Series Synthesis via Public Knowledge Distillation
Penghang Liu, Haibei Zhu, Eleonora Kreacic, Svitlana Vyetrenko
TL;DR
This work tackles privacy-preserving synthesis of sensitive multivariate time series by leveraging heterogeneous public knowledge. It introduces Pub2Priv, a conditional diffusion framework powered by a knowledge transformer that encodes public metadata into conditioning embeddings, trained under DP-SGD to guarantee $(\varepsilon,\delta)$-DP. A practical identifiability metric is proposed to assess empirical privacy leakage beyond theoretical DP guarantees. Across finance, energy, and commodity domains, Pub2Priv delivers improved privacy-utility trade-offs over DP-based baselines and provides insights into how public-private correlations influence downstream utility.
Abstract
Sharing sensitive time series data in domains such as finance, healthcare, and energy consumption, such as patient records or investment accounts, is often restricted due to privacy concerns. Privacy-aware synthetic time series generation addresses this challenge by enforcing noise during training, inherently introducing a trade-off between privacy and utility. In many cases, sensitive sequences is correlated with publicly available, non-sensitive contextual metadata (e.g., household electricity consumption may be influenced by weather conditions and electricity prices). However, existing privacy-aware data generation methods often overlook this opportunity, resulting in suboptimal privacy-utility trade-offs. In this paper, we present Pub2Priv, a novel framework for generating private time series data by leveraging heterogeneous public knowledge. Our model employs a self-attention mechanism to encode public data into temporal and feature embeddings, which serve as conditional inputs for a diffusion model to generate synthetic private sequences. Additionally, we introduce a practical metric to assess privacy by evaluating the identifiability of the synthetic data. Experimental results show that Pub2Priv consistently outperforms state-of-the-art benchmarks in improving the privacy-utility trade-off across finance, energy, and commodity trading domains.
