Table of Contents
Fetching ...

TSGM: A Flexible Framework for Generative Modeling of Synthetic Time Series

Alexander Nikitin, Letizia Iannucci, Samuel Kaski

TL;DR

TSGM tackles the challenge of limited and sensitive time-series data by providing a flexible, open-source framework that unifies data-driven and simulator-based generative methods for synthetic time series. It introduces a comprehensive evaluation suite across realism, predictive consistency, privacy, fairness, and downstream utility, and supplies an Architecture Zoo, augmentation tools, built-in datasets, and a CLI to facilitate rapid experimentation and deployment. The framework enables researchers and practitioners to benchmark methods, compare metrics, and accelerate safe data sharing and augmentation in time-series domains. The practical impact lies in lowering barriers to using synthetic time series in privacy-preserving data science and in enabling reproducible comparisons across methods.

Abstract

Temporally indexed data are essential in a wide range of fields and of interest to machine learning researchers. Time series data, however, are often scarce or highly sensitive, which precludes the sharing of data between researchers and industrial organizations and the application of existing and new data-intensive ML methods. A possible solution to this bottleneck is to generate synthetic data. In this work, we introduce Time Series Generative Modeling (TSGM), an open-source framework for the generative modeling of synthetic time series. TSGM includes a broad repertoire of machine learning methods: generative models, probabilistic, and simulator-based approaches. The framework enables users to evaluate the quality of the produced data from different angles: similarity, downstream effectiveness, predictive consistency, diversity, and privacy. The framework is extensible, which allows researchers to rapidly implement their own methods and compare them in a shareable environment. TSGM was tested on open datasets and in production and proved to be beneficial in both cases. Additionally to the library, the project allows users to employ command line interfaces for synthetic data generation which lowers the entry threshold for those without a programming background.

TSGM: A Flexible Framework for Generative Modeling of Synthetic Time Series

TL;DR

TSGM tackles the challenge of limited and sensitive time-series data by providing a flexible, open-source framework that unifies data-driven and simulator-based generative methods for synthetic time series. It introduces a comprehensive evaluation suite across realism, predictive consistency, privacy, fairness, and downstream utility, and supplies an Architecture Zoo, augmentation tools, built-in datasets, and a CLI to facilitate rapid experimentation and deployment. The framework enables researchers and practitioners to benchmark methods, compare metrics, and accelerate safe data sharing and augmentation in time-series domains. The practical impact lies in lowering barriers to using synthetic time series in privacy-preserving data science and in enabling reproducible comparisons across methods.

Abstract

Temporally indexed data are essential in a wide range of fields and of interest to machine learning researchers. Time series data, however, are often scarce or highly sensitive, which precludes the sharing of data between researchers and industrial organizations and the application of existing and new data-intensive ML methods. A possible solution to this bottleneck is to generate synthetic data. In this work, we introduce Time Series Generative Modeling (TSGM), an open-source framework for the generative modeling of synthetic time series. TSGM includes a broad repertoire of machine learning methods: generative models, probabilistic, and simulator-based approaches. The framework enables users to evaluate the quality of the produced data from different angles: similarity, downstream effectiveness, predictive consistency, diversity, and privacy. The framework is extensible, which allows researchers to rapidly implement their own methods and compare them in a shareable environment. TSGM was tested on open datasets and in production and proved to be beneficial in both cases. Additionally to the library, the project allows users to employ command line interfaces for synthetic data generation which lowers the entry threshold for those without a programming background.
Paper Structure (15 sections, 5 figures, 3 tables)

This paper contains 15 sections, 5 figures, 3 tables.

Figures (5)

  • Figure 1: The architecture of TSGM. The generators are the core of the framework, implementing various generative methods for time series. Architecture Zoo provides a collection of NN architectures that can be reused at different stages of the pipeline; it can be extended by user-defined models. The monitorings module provides a set of routines for examining the training procedure and helps to check convergence and intermediate results. The statistics module implements summary statistics used by metrics. The metrics module evaluates the quality of the generated data and is used either during training or for the final evaluation of the generated data. The code example (right) demonstrates synthetic dataset generation with TSGM.
  • Figure 2: The taxonomy of generative methods in TSGM. Simulation-based methods allow users to specify a simulator. Data-driven methods do not require users to specify the generative process and model time series purely from data.
  • Figure 3: \ref{['figure:data_temporal_gan']} shows the original temporally labeled time series, and \ref{['figure:synth_data_temporal_gan']} presents synthetic data generated by cGAN. Each graph shows conditions (green lines) and time series (blue lines). We can observe that synthetic data resembles the pattern of the original data.
  • Figure 4: Comparing the methods for data sharing (left) and data augmentation (right) tasks across three datasets. The values represent the fraction of cases (metric and dataset pairs) where a method from a row performs better than a method from a column. t-SNE visualizes individual historical ($\blacktriangleleft$) and generated ($\bullet$) time series.
  • Figure C.1: TSGM Monitoring. \ref{['figure:timegan_training']} shows the values of various losses utilized for training TimeGAN, and \ref{['figure:synth_data_timegan1000']}-\ref{['figure:synth_data_timegan3000']}-\ref{['figure:synth_data_timegan7000']}-\ref{['figure:synth_data_timegan12000']} show t-SNE analysis of training data and synthetic data generated by TimeGAN after 1000 epochs, 3000 epochs, 7000 epochs, and 12000 epochs respectively. Training data are marked with red triangles, whereas synthetic data are marked with blue hexagons.