Table of Contents
Fetching ...

A Survey of Transformer Enabled Time Series Synthesis

Alexander Sommers, Logan Cummins, Sudip Mittal, Shahram Rahimi, Maria Seale, Joseph Jaboure, Thomas Arnold

TL;DR

This survey identifies a gap at the intersection of transformer networks and time-series generation, highlighting the potential for transformers to advance data augmentation, privacy preservation, and explainability in time-series domains. It catalogs twelve transformer-enabled TS generative works, classifying them by task (imputation, forecasting, synthesis) and by architectural lineage (GAN-based, diffusion, state-space hybrids, and hybrid TSA architectures). Key findings include a dominance of transformer-based encoders/decoders, the emergence of hybrid models that pair autoregressive and direct horizon predictions, and the need for standardized benchmarks to enable fair comparisons. The work emphasizes opportunities in leveraging inductive biases, transferring pretrained models to data-scarce settings, and exploring short-, mid-, and long-range dependency modeling with TCNs, SSMs, and attention mechanisms to advance robust, conditional TS generation.

Abstract

Generative AI has received much attention in the image and language domains, with the transformer neural network continuing to dominate the state of the art. Application of these models to time series generation is less explored, however, and is of great utility to machine learning, privacy preservation, and explainability research. The present survey identifies this gap at the intersection of the transformer, generative AI, and time series data, and reviews works in this sparsely populated subdomain. The reviewed works show great variety in approach, and have not yet converged on a conclusive answer to the problems the domain poses. GANs, diffusion models, state space models, and autoencoders were all encountered alongside or surrounding the transformers which originally motivated the survey. While too open a domain to offer conclusive insights, the works surveyed are quite suggestive, and several recommendations for best practice, and suggestions of valuable future work, are provided.

A Survey of Transformer Enabled Time Series Synthesis

TL;DR

This survey identifies a gap at the intersection of transformer networks and time-series generation, highlighting the potential for transformers to advance data augmentation, privacy preservation, and explainability in time-series domains. It catalogs twelve transformer-enabled TS generative works, classifying them by task (imputation, forecasting, synthesis) and by architectural lineage (GAN-based, diffusion, state-space hybrids, and hybrid TSA architectures). Key findings include a dominance of transformer-based encoders/decoders, the emergence of hybrid models that pair autoregressive and direct horizon predictions, and the need for standardized benchmarks to enable fair comparisons. The work emphasizes opportunities in leveraging inductive biases, transferring pretrained models to data-scarce settings, and exploring short-, mid-, and long-range dependency modeling with TCNs, SSMs, and attention mechanisms to advance robust, conditional TS generation.

Abstract

Generative AI has received much attention in the image and language domains, with the transformer neural network continuing to dominate the state of the art. Application of these models to time series generation is less explored, however, and is of great utility to machine learning, privacy preservation, and explainability research. The present survey identifies this gap at the intersection of the transformer, generative AI, and time series data, and reviews works in this sparsely populated subdomain. The reviewed works show great variety in approach, and have not yet converged on a conclusive answer to the problems the domain poses. GANs, diffusion models, state space models, and autoencoders were all encountered alongside or surrounding the transformers which originally motivated the survey. While too open a domain to offer conclusive insights, the works surveyed are quite suggestive, and several recommendations for best practice, and suggestions of valuable future work, are provided.
Paper Structure (29 sections, 5 figures, 2 tables)

This paper contains 29 sections, 5 figures, 2 tables.

Figures (5)

  • Figure 1: GANs are primarily composed of two modules, the generator and discriminator. The trained generator satisfies the desiderata of the users. To apply evolutionary pressure, the discriminator is trained to distinguish synthetic and authentic samples. The adversarial game trains the generator to better deceive the discriminator, producing realistic samples. The ideal convergence scenario sees the discriminator only able to guess (50-50) as to the authenticity of a presented instance. Conditional information can be supplied at multiple points if needed.
  • Figure 2: A full TNN has two stacks, the encoder (left) and decoder (right), stemming from earlier sequence-to-sequence models. Both stack types are made of successive blocks, with only one block in each stack shown here. Key-Query-Value attention mechanisms project the query source as a linear combination of the Value source as a function of affinity with the Key source. Self-Attention uses the same sequence as all three sources, while Encoder-Decoder Attention uses the encoder stack output as the Key and Value source. Vaswani et al. used "post-norm" operations, placing the normalization after each primary operation. More recent work suggests that "pre-norm" configurations are superior postnorm_prenorm.
  • Figure 3: Diffusion models in the DiffWave lineage typically consist of several denoising blocks, each with a skip connection. When generation is conditioned, conditional information is supplied alongside the noise input. Noise is drawn from the standard normal distribution $\mathbf{Z}$. An embedding indicating each step in a diffusion schedule is supplied to each module, while only the first module accepts the noise input and any conditional data. Each input has its own embedding pipeline. All skip connection outputs are accepted, as is the final denoised output, by the final output module.
  • Figure 4: This is an abstract view of a denoising block as used in diffusion models of the DiffWave lineage. A view of this module in situ with its neighbors can be seen in figure \ref{['Diffusion_abs']} . The exact implementation of the submodules (CNNs, TCNs, TNNs, SSMs, etc...) can vary, so here they are represented as "processes". The first process conditions the input to the block with the time step in the noise/diffusion schedule (DiffWave provides more details on diffusion schedules, and the "forward" and "reverse" processes). If conditioned generation is implemented, and this is the first block, then a primary conditioning process is supplied. The gated activation process, inherited from WaveNet WaveNet feeds both outputs. The feed forward process goes to the next block, or the output, and the skip connection jumps directly to the output pipeline, which accepts all skip connections.
  • Figure 5: A single state space module executes a three part process of projection, evolution, and a skip connection. Projection matrix B casts a sequence to a higher dimensionality. This projection is treated as coordinates in a state space, a vector space describing the attributes of a dynamic system. The transition matrix A applies several partial differential equations to this state vector, progressing it forward in time by a discrete time step. The C projection returns the evolved state to the original dimensionality. Transformation D acts as a learnable skip connection. With trained parameters, this facilitates an analog to an RNN's autoregressive predictions, but can be trained like a CNN, with a great deal of precomputation, as achieved in S4.