Table of Contents
Fetching ...

GenFormer: A Deep-Learning-Based Approach for Generating Multivariate Stochastic Processes

Haoran Zhao, Wayne Isaac Tan Uy

TL;DR

GenFormer tackles the challenge of generating synthetic multivariate stochastic processes with many spatial locations and long horizons by coupling a univariate Markov-state model (constructed via clustering) with a Transformer-based mapping from Markov states to time-series values. A post-processing pipeline using Cholesky-based transformation and reshuffling ensures exact marginal distributions while preserving (or approximating) spatial correlations and higher-order statistics. The approach demonstrates scalability and accuracy on synthetic SDE data and Florida wind speeds, achieving exact marginals and improved higher-order properties beyond second moments, which translates into more reliable exceedance probability estimates for risk management. This framework offers a practical, high-dimensional stochastic generator suitable for reliability analysis, parametric insurance, and other engineering applications where rich temporal and spatial dependencies are essential.

Abstract

Stochastic generators are essential to produce synthetic realizations that preserve target statistical properties. We propose GenFormer, a stochastic generator for spatio-temporal multivariate stochastic processes. It is constructed using a Transformer-based deep learning model that learns a mapping between a Markov state sequence and time series values. The synthetic data generated by the GenFormer model preserves the target marginal distributions and approximately captures other desired statistical properties even in challenging applications involving a large number of spatial locations and a long simulation horizon. The GenFormer model is applied to simulate synthetic wind speed data at various stations in Florida to calculate exceedance probabilities for risk management.

GenFormer: A Deep-Learning-Based Approach for Generating Multivariate Stochastic Processes

TL;DR

GenFormer tackles the challenge of generating synthetic multivariate stochastic processes with many spatial locations and long horizons by coupling a univariate Markov-state model (constructed via clustering) with a Transformer-based mapping from Markov states to time-series values. A post-processing pipeline using Cholesky-based transformation and reshuffling ensures exact marginal distributions while preserving (or approximating) spatial correlations and higher-order statistics. The approach demonstrates scalability and accuracy on synthetic SDE data and Florida wind speeds, achieving exact marginals and improved higher-order properties beyond second moments, which translates into more reliable exceedance probability estimates for risk management. This framework offers a practical, high-dimensional stochastic generator suitable for reliability analysis, parametric insurance, and other engineering applications where rich temporal and spatial dependencies are essential.

Abstract

Stochastic generators are essential to produce synthetic realizations that preserve target statistical properties. We propose GenFormer, a stochastic generator for spatio-temporal multivariate stochastic processes. It is constructed using a Transformer-based deep learning model that learns a mapping between a Markov state sequence and time series values. The synthetic data generated by the GenFormer model preserves the target marginal distributions and approximately captures other desired statistical properties even in challenging applications involving a large number of spatial locations and a long simulation horizon. The GenFormer model is applied to simulate synthetic wind speed data at various stations in Florida to calculate exceedance probabilities for risk management.
Paper Structure (30 sections, 12 equations, 17 figures, 5 tables, 1 algorithm)

This paper contains 30 sections, 12 equations, 17 figures, 5 tables, 1 algorithm.

Figures (17)

  • Figure 1: Deep learning model architecture based on the encoder-decoder framework. The model processes inputs through an embedding layer (red block), generating the hidden representation which undergoes further updates in the encoder layers (gray block). The decoder (green block), in conjunction with a linear layer (purple block), utilizes the hidden representation from the encoder for generative inference, yielding the predicted sequence (highlighted in yellow).
  • Figure 2: Transformer-based deep learning model with Markov state embedding. The proposed approach includes a Markov state embedding in addition to the value and time embedding present in the embedding layer. The remainder of the model architecture is the same as in Figure \ref{['fig:model_architecture']}.
  • Figure 3: Deep learning model for Markov state sequence generation when Markov order $p \ge 2$. We adopt a decoder-only structure without cross attention mechanism. The input of the model is the Markov states in the previous $p$ time stamps concatenated by a vector of length 1. This is passed to an embedding layer and multiple decoder blocks. The Softmax layer normalizes the weights of Markov states to obtain probabilities which the multinomial random variable generator utilizes to generate synthetic Markov states.
  • Figure 4: Construction of input-output data pairs. For each sequence of realizations, we apply a sliding window of length $q^{\text{enc}}_{\text{in}} + q_{\text{out}}$ to the time series matrix $\boldsymbol{\mathcal{X}}$ and the vectors $\boldsymbol{\mathcal{Y}}$ and $\boldsymbol{\mathcal{T}}$ of Markov state and time sequences. The first $q^{\text{enc}}_{\text{in}}$ components of the window are inputs to the deep learning model while the subsequent $q_{\text{out}}$ components constitute the target output sequence for the model.
  • Figure 5: Scatter plot of the normalized frequencies of Markov states in the observed and simulated sequences. Generating Markov state sequences by estimating the transition matrix from data is computationally challenging for large Markov order $p$. This example shows that for large $p$, the trained deep learning model for Markov state sequence generation can closely reproduce the frequencies of Markov states in the observed Markov state sequence data.
  • ...and 12 more figures