GenFormer: A Deep-Learning-Based Approach for Generating Multivariate Stochastic Processes
Haoran Zhao, Wayne Isaac Tan Uy
TL;DR
GenFormer tackles the challenge of generating synthetic multivariate stochastic processes with many spatial locations and long horizons by coupling a univariate Markov-state model (constructed via clustering) with a Transformer-based mapping from Markov states to time-series values. A post-processing pipeline using Cholesky-based transformation and reshuffling ensures exact marginal distributions while preserving (or approximating) spatial correlations and higher-order statistics. The approach demonstrates scalability and accuracy on synthetic SDE data and Florida wind speeds, achieving exact marginals and improved higher-order properties beyond second moments, which translates into more reliable exceedance probability estimates for risk management. This framework offers a practical, high-dimensional stochastic generator suitable for reliability analysis, parametric insurance, and other engineering applications where rich temporal and spatial dependencies are essential.
Abstract
Stochastic generators are essential to produce synthetic realizations that preserve target statistical properties. We propose GenFormer, a stochastic generator for spatio-temporal multivariate stochastic processes. It is constructed using a Transformer-based deep learning model that learns a mapping between a Markov state sequence and time series values. The synthetic data generated by the GenFormer model preserves the target marginal distributions and approximately captures other desired statistical properties even in challenging applications involving a large number of spatial locations and a long simulation horizon. The GenFormer model is applied to simulate synthetic wind speed data at various stations in Florida to calculate exceedance probabilities for risk management.
