Table of Contents
Fetching ...

TimeWak: Temporal Chained-Hashing Watermark for Time Series Data

Zhi Wen Soi, Chaoyi Zhu, Fouad Abiad, Aditya Shankar, Jeroen M. Galjaard, Huijuan Wang, Lydia Y. Chen

TL;DR

TimeWak addresses the challenge of watermarking synthetic multivariate time-series data generated by diffusion models by embedding watermarks directly in the data space through temporal chained-hashing and feature shuffling. It introduces an $bdepsilon$-exact inversion based on BDIA-DDIM to maintain high data utility while enabling reliable watermark detection. Across five datasets, TimeWak delivers superior context-FID and correlational metrics and demonstrates robust watermark detectability under post-editing attacks, outperforming state-of-the-art baselines. The work provides theoretical bounds on inversion error and offers practical implications for traceability of privacy-sensitive time-series data, with future work focusing on streaming and per-timestep watermarking capabilities.

Abstract

Synthetic time series generated by diffusion models enable sharing privacy-sensitive datasets, such as patients' functional MRI records. Key criteria for synthetic data include high data utility and traceability to verify the data source. Recent watermarking methods embed in homogeneous latent spaces, but state-of-the-art time series generators operate in data space, making latent-based watermarking incompatible. This creates the challenge of watermarking directly in data space while handling feature heterogeneity and temporal dependencies. We propose TimeWak, the first watermarking algorithm for multivariate time series diffusion models. To handle temporal dependence and spatial heterogeneity, TimeWak embeds a temporal chained-hashing watermark directly within the temporal-feature data space. The other unique feature is the $ε$-exact inversion, which addresses the non-uniform reconstruction error distribution across features from inverting the diffusion process to detect watermarks. We derive the error bound of inverting multivariate time series while preserving robust watermark detectability. We extensively evaluate TimeWak on its impact on synthetic data quality, watermark detectability, and robustness under various post-editing attacks, against five datasets and baselines of different temporal lengths. Our results show that TimeWak achieves improvements of 61.96% in context-FID score, and 8.44% in correlational scores against the strongest state-of-the-art baseline, while remaining consistently detectable.

TimeWak: Temporal Chained-Hashing Watermark for Time Series Data

TL;DR

TimeWak addresses the challenge of watermarking synthetic multivariate time-series data generated by diffusion models by embedding watermarks directly in the data space through temporal chained-hashing and feature shuffling. It introduces an -exact inversion based on BDIA-DDIM to maintain high data utility while enabling reliable watermark detection. Across five datasets, TimeWak delivers superior context-FID and correlational metrics and demonstrates robust watermark detectability under post-editing attacks, outperforming state-of-the-art baselines. The work provides theoretical bounds on inversion error and offers practical implications for traceability of privacy-sensitive time-series data, with future work focusing on streaming and per-timestep watermarking capabilities.

Abstract

Synthetic time series generated by diffusion models enable sharing privacy-sensitive datasets, such as patients' functional MRI records. Key criteria for synthetic data include high data utility and traceability to verify the data source. Recent watermarking methods embed in homogeneous latent spaces, but state-of-the-art time series generators operate in data space, making latent-based watermarking incompatible. This creates the challenge of watermarking directly in data space while handling feature heterogeneity and temporal dependencies. We propose TimeWak, the first watermarking algorithm for multivariate time series diffusion models. To handle temporal dependence and spatial heterogeneity, TimeWak embeds a temporal chained-hashing watermark directly within the temporal-feature data space. The other unique feature is the -exact inversion, which addresses the non-uniform reconstruction error distribution across features from inverting the diffusion process to detect watermarks. We derive the error bound of inverting multivariate time series while preserving robust watermark detectability. We extensively evaluate TimeWak on its impact on synthetic data quality, watermark detectability, and robustness under various post-editing attacks, against five datasets and baselines of different temporal lengths. Our results show that TimeWak achieves improvements of 61.96% in context-FID score, and 8.44% in correlational scores against the strongest state-of-the-art baseline, while remaining consistently detectable.

Paper Structure

This paper contains 50 sections, 2 theorems, 38 equations, 12 figures, 24 tables.

Key Result

Theorem 3.1

Let $\{\mathbf{x}_t\}_{t=0}^{T}$ be the sequence of diffusion states governed by the BDIA-DDIM recurrence for a given dataset, following Equation eq:BDIA-Inversion. Given the noise estimator $\hat{\boldsymbol{\epsilon}}_{\boldsymbol{\theta}}$ follows Assumption assump. Suppose that instead of the ex Let the propagated error at time $t$ be defined as, Then, for $t \geq 1$, the error is bounded by,

Figures (12)

  • Figure 1: Overview of TimeWak. First, we assign random seeds at the beginning of each interval. 1 Temporally chained-hashing. A, B, and C (pink) show seeds being copied from the previous step and the feature order shuffled. 2 Shuffling the seeds for each series. Positional indices are highlighted in green. 3 Constructing an initial Gaussian noise. 4 Generating multivariate time series. 5 Reversing the diffusion process. 6 Recovering the watermark seed. 7 Unshuffling the seeds in the opposite way they were shuffled. 8 Bit accuracy between the hash and recovered seed.
  • Figure 2: Average reconstruction error distribution across feature indices and timesteps on Diffusion-TS with DDIM and DDIM inversion. Reconstruction error is the signed absolute difference between reconstructed and original values.
  • Figure 3: TPR@0.1%FPR against number of samples across five datasets under 64-length sequences.
  • Figure 4: Forward and backward diffusion process. $\mathbf{x}_0$ denotes the initial signal window and $\mathbf{x}_T$ corresponds to the fully diffused version of the signal obtained after $T$ forward diffusion steps.
  • Figure 5: $\Delta_t$ for different datasets.
  • ...and 7 more figures

Theorems & Definitions (2)

  • Theorem 3.1
  • Theorem C.1