TSGM: A Flexible Framework for Generative Modeling of Synthetic Time Series
Alexander Nikitin, Letizia Iannucci, Samuel Kaski
TL;DR
TSGM tackles the challenge of limited and sensitive time-series data by providing a flexible, open-source framework that unifies data-driven and simulator-based generative methods for synthetic time series. It introduces a comprehensive evaluation suite across realism, predictive consistency, privacy, fairness, and downstream utility, and supplies an Architecture Zoo, augmentation tools, built-in datasets, and a CLI to facilitate rapid experimentation and deployment. The framework enables researchers and practitioners to benchmark methods, compare metrics, and accelerate safe data sharing and augmentation in time-series domains. The practical impact lies in lowering barriers to using synthetic time series in privacy-preserving data science and in enabling reproducible comparisons across methods.
Abstract
Temporally indexed data are essential in a wide range of fields and of interest to machine learning researchers. Time series data, however, are often scarce or highly sensitive, which precludes the sharing of data between researchers and industrial organizations and the application of existing and new data-intensive ML methods. A possible solution to this bottleneck is to generate synthetic data. In this work, we introduce Time Series Generative Modeling (TSGM), an open-source framework for the generative modeling of synthetic time series. TSGM includes a broad repertoire of machine learning methods: generative models, probabilistic, and simulator-based approaches. The framework enables users to evaluate the quality of the produced data from different angles: similarity, downstream effectiveness, predictive consistency, diversity, and privacy. The framework is extensible, which allows researchers to rapidly implement their own methods and compare them in a shareable environment. TSGM was tested on open datasets and in production and proved to be beneficial in both cases. Additionally to the library, the project allows users to employ command line interfaces for synthetic data generation which lowers the entry threshold for those without a programming background.
