Table of Contents
Fetching ...

SigDiffusions: Score-Based Diffusion Models for Time Series via Log-Signature Embeddings

Barbora Barancikova, Zhuoyue Huang, Cristopher Salvi

TL;DR

SigDiffusions introduces a novel diffusion framework that operates on log-signature embeddings to generate long multivariate time series while preserving the algebraic structure of signatures. It combines forward-noise diffusion in the Lie algebra $\mathcal{L}^n(\mathbb{R}^d)$ with newly derived closed-form inversion formulae, enabling exact reconstruction of paths from log-signatures using Fourier or orthogonal polynomial bases. Empirical results on synthetic and real datasets demonstrate competitive performance against state-of-the-art diffusion models, supported by detailed inversion evaluations and model-capacity analyses. The work paves the way for efficient, scalable time-series generation in continuous time, with future directions including alternative path-embeddings and discrete-time signature approaches.

Abstract

Score-based diffusion models have recently emerged as state-of-the-art generative models for a variety of data modalities. Nonetheless, it remains unclear how to adapt these models to generate long multivariate time series. Viewing a time series as the discretisation of an underlying continuous process, we introduce SigDiffusion, a novel diffusion model operating on log-signature embeddings of the data. The forward and backward processes gradually perturb and denoise log-signatures while preserving their algebraic structure. To recover a signal from its log-signature, we provide new closed-form inversion formulae expressing the coefficients obtained by expanding the signal in a given basis (e.g. Fourier or orthogonal polynomials) as explicit polynomial functions of the log-signature. Finally, we show that combining SigDiffusions with these inversion formulae results in high-quality long time series generation, competitive with the current state-of-the-art on various datasets of synthetic and real-world examples.

SigDiffusions: Score-Based Diffusion Models for Time Series via Log-Signature Embeddings

TL;DR

SigDiffusions introduces a novel diffusion framework that operates on log-signature embeddings to generate long multivariate time series while preserving the algebraic structure of signatures. It combines forward-noise diffusion in the Lie algebra with newly derived closed-form inversion formulae, enabling exact reconstruction of paths from log-signatures using Fourier or orthogonal polynomial bases. Empirical results on synthetic and real datasets demonstrate competitive performance against state-of-the-art diffusion models, supported by detailed inversion evaluations and model-capacity analyses. The work paves the way for efficient, scalable time-series generation in continuous time, with future directions including alternative path-embeddings and discrete-time signature approaches.

Abstract

Score-based diffusion models have recently emerged as state-of-the-art generative models for a variety of data modalities. Nonetheless, it remains unclear how to adapt these models to generate long multivariate time series. Viewing a time series as the discretisation of an underlying continuous process, we introduce SigDiffusion, a novel diffusion model operating on log-signature embeddings of the data. The forward and backward processes gradually perturb and denoise log-signatures while preserving their algebraic structure. To recover a signal from its log-signature, we provide new closed-form inversion formulae expressing the coefficients obtained by expanding the signal in a given basis (e.g. Fourier or orthogonal polynomials) as explicit polynomial functions of the log-signature. Finally, we show that combining SigDiffusions with these inversion formulae results in high-quality long time series generation, competitive with the current state-of-the-art on various datasets of synthetic and real-world examples.
Paper Structure (47 sections, 12 theorems, 102 equations, 12 figures, 5 tables)

This paper contains 47 sections, 12 theorems, 102 equations, 12 figures, 5 tables.

Key Result

Lemma 2.0.1

For any two smooth paths $x,y : [0,1] \to \mathbb R^d$ the following holds where $*$ denotes path-concatenation, and $\cdot$ is the signature tensor product defined in eqn:tensor-product.

Figures (12)

  • Figure 1: SigDiffusions pipeline. The signatures of a time series dataset are points distributed in a non-Euclidean space (Lie group). Converting to log-signatures maps them to a Euclidean space (Lie algebra) where standard diffusion models operate. Calculating the log-signature embedding and its inverse (blue box) are fully deterministic operations, which greatly simplifies the learning task. The log-signatures serve as inputs to a score-based diffusion model (orange box). Step 6 is enabled by our newly derived closed-form inversion formulae.
  • Figure 2: Comparison of different inversion methods.
  • Figure 3: $L_2$ error of signature inversion via orthogonal polynomials with respect to the polynomial order $N$ and time. Error and time are calculated by an average over 15 paths with $200$ sample points.
  • Figure 4: Time series representation and model capacity trade-off. Left to right: real sample from a noisy Lotka–Volterra system, sample generated by SigDiffusion with signature truncation level 3, 4, 5, 6.
  • Figure 5: Path in Example \ref{['signed_area_example']}. The shaded region represents the signed Lévy area.
  • ...and 7 more figures

Theorems & Definitions (27)

  • Example 2.1
  • Lemma 2.0.1: Chen's relation
  • Theorem 3.1
  • Theorem 3.2
  • Remark
  • Lemma A.0.1: Shuffle identity
  • Lemma A.0.2: Chen–Chow
  • Example A.1: Geometric interpretation of a 2-dimensional path
  • Example A.2: Signatures of linear paths
  • Definition A.1: Concatenation
  • ...and 17 more