Table of Contents
Fetching ...

Population Aware Diffusion for Time Series Generation

Yang Li, Han Meng, Zhenyu Bi, Ingolv T. Urnes, Haipeng Chen

TL;DR

PaD-TS tackles time-series generation by explicitly preserving population-level properties alongside individual authenticity. It introduces a population-aware training objective and a dual-channel Transformer-based encoder to maintain distributions of values and cross-dimension dependencies, such as cross-correlation, in synthetic data. Empirically, PaD-TS achieves substantial gains in population-level preservation (FDDS and VDS improved by approximately 5.7–5.9x) while maintaining competitive or superior per-sample authenticity and predictive utility compared with strong baselines. This approach reduces distribution shifts in synthetic TS data and enhances the reliability of downstream analyses in high-stakes domains.

Abstract

Diffusion models have shown promising ability in generating high-quality time series (TS) data. Despite the initial success, existing works mostly focus on the authenticity of data at the individual level, but pay less attention to preserving the population-level properties on the entire dataset. Such population-level properties include value distributions for each dimension and distributions of certain functional dependencies (e.g., cross-correlation, CC) between different dimensions. For instance, when generating house energy consumption TS data, the value distributions of the outside temperature and the kitchen temperature should be preserved, as well as the distribution of CC between them. Preserving such TS population-level properties is critical in maintaining the statistical insights of the datasets, mitigating model bias, and augmenting downstream tasks like TS prediction. Yet, it is often overlooked by existing models. Hence, data generated by existing models often bear distribution shifts from the original data. We propose Population-aware Diffusion for Time Series (PaD-TS), a new TS generation model that better preserves the population-level properties. The key novelties of PaD-TS include 1) a new training method explicitly incorporating TS population-level property preservation, and 2) a new dual-channel encoder model architecture that better captures the TS data structure. Empirical results in major benchmark datasets show that PaD-TS can improve the average CC distribution shift score between real and synthetic data by 5.9x while maintaining a performance comparable to state-of-the-art models on individual-level authenticity.

Population Aware Diffusion for Time Series Generation

TL;DR

PaD-TS tackles time-series generation by explicitly preserving population-level properties alongside individual authenticity. It introduces a population-aware training objective and a dual-channel Transformer-based encoder to maintain distributions of values and cross-dimension dependencies, such as cross-correlation, in synthetic data. Empirically, PaD-TS achieves substantial gains in population-level preservation (FDDS and VDS improved by approximately 5.7–5.9x) while maintaining competitive or superior per-sample authenticity and predictive utility compared with strong baselines. This approach reduces distribution shifts in synthetic TS data and enhances the reliability of downstream analyses in high-stakes domains.

Abstract

Diffusion models have shown promising ability in generating high-quality time series (TS) data. Despite the initial success, existing works mostly focus on the authenticity of data at the individual level, but pay less attention to preserving the population-level properties on the entire dataset. Such population-level properties include value distributions for each dimension and distributions of certain functional dependencies (e.g., cross-correlation, CC) between different dimensions. For instance, when generating house energy consumption TS data, the value distributions of the outside temperature and the kitchen temperature should be preserved, as well as the distribution of CC between them. Preserving such TS population-level properties is critical in maintaining the statistical insights of the datasets, mitigating model bias, and augmenting downstream tasks like TS prediction. Yet, it is often overlooked by existing models. Hence, data generated by existing models often bear distribution shifts from the original data. We propose Population-aware Diffusion for Time Series (PaD-TS), a new TS generation model that better preserves the population-level properties. The key novelties of PaD-TS include 1) a new training method explicitly incorporating TS population-level property preservation, and 2) a new dual-channel encoder model architecture that better captures the TS data structure. Empirical results in major benchmark datasets show that PaD-TS can improve the average CC distribution shift score between real and synthetic data by 5.9x while maintaining a performance comparable to state-of-the-art models on individual-level authenticity.
Paper Structure (19 sections, 19 equations, 8 figures, 9 tables, 1 algorithm)

This paper contains 19 sections, 19 equations, 8 figures, 9 tables, 1 algorithm.

Figures (8)

  • Figure 1: Histogram of CC distribution between the original and synthetic Energy datasets. The CC values are calculated between the outside temperature and kitchen temperature. PaD-TS (top left) best preserves such functional dependency distribution. Previous models tend to generate data points with a CC score close to 1 or -1, which leads to biases for downstream tasks.
  • Figure 2: PaD-TS model architecture
  • Figure 3: t-SNE plots on the cross-correlation values between original data (red dots) and synthetic data (blue dots) on the Sines and Stocks dataset.
  • Figure 4: t-SNE plots on the cross-correlation values between original data (red dots) and synthetic data (blue dots) on the Energy dataset.
  • Figure 5: Ablation study on $\alpha$ and Energy dataset. The blue and red curves resp. depict the FDDS and VDS scores.
  • ...and 3 more figures