Table of Contents
Fetching ...

Learning with Expected Signatures: Theory and Applications

Lorenzo Lucchese, Mikko S. Pakkanen, Almut E. D. Veraart

TL;DR

The paper addresses the challenge of deriving probabilistic guarantees for model-free embeddings built from the expected signature of data streams. It introduces a unified theory linking discrete, pathwise signatures to their continuous-time counterparts under canonical geometric rough paths, proving consistency and asymptotic normality in both in-fill and long-span regimes. A martingale-correction variant is proposed to reduce finite-sample variance, with guarantees that the bias remains controlled; this yields substantial improvements in synthetic and financial ML tasks. The authors validate the theory across Brownian, fractional Brownian, CAR, and Heston models, and demonstrate practical gains in time-series classification, derivative pricing, and distributional regression, underscoring the approach’s broad applicability and potential for probabilistic interpretation of signature-based ML methods.

Abstract

The expected signature maps a collection of data streams to a lower dimensional representation, with a remarkable property: the resulting feature tensor can fully characterize the data generating distribution. This "model-free" embedding has been successfully leveraged to build multiple domain-agnostic machine learning (ML) algorithms for time series and sequential data. The convergence results proved in this paper bridge the gap between the expected signature's empirical discrete-time estimator and its theoretical continuous-time value, allowing for a more complete probabilistic interpretation of expected signature-based ML methods. Moreover, when the data generating process is a martingale, we suggest a simple modification of the expected signature estimator with significantly lower mean squared error and empirically demonstrate how it can be effectively applied to improve predictive performance.

Learning with Expected Signatures: Theory and Applications

TL;DR

The paper addresses the challenge of deriving probabilistic guarantees for model-free embeddings built from the expected signature of data streams. It introduces a unified theory linking discrete, pathwise signatures to their continuous-time counterparts under canonical geometric rough paths, proving consistency and asymptotic normality in both in-fill and long-span regimes. A martingale-correction variant is proposed to reduce finite-sample variance, with guarantees that the bias remains controlled; this yields substantial improvements in synthetic and financial ML tasks. The authors validate the theory across Brownian, fractional Brownian, CAR, and Heston models, and demonstrate practical gains in time-series classification, derivative pricing, and distributional regression, underscoring the approach’s broad applicability and potential for probabilistic interpretation of signature-based ML methods.

Abstract

The expected signature maps a collection of data streams to a lower dimensional representation, with a remarkable property: the resulting feature tensor can fully characterize the data generating distribution. This "model-free" embedding has been successfully leveraged to build multiple domain-agnostic machine learning (ML) algorithms for time series and sequential data. The convergence results proved in this paper bridge the gap between the expected signature's empirical discrete-time estimator and its theoretical continuous-time value, allowing for a more complete probabilistic interpretation of expected signature-based ML methods. Moreover, when the data generating process is a martingale, we suggest a simple modification of the expected signature estimator with significantly lower mean squared error and empirically demonstrate how it can be effectively applied to improve predictive performance.

Paper Structure

This paper contains 68 sections, 11 theorems, 261 equations, 3 figures, 5 tables, 5 algorithms.

Key Result

Theorem 2.8

Let $k=\max_{I\in\mathbf{I}}|I|$ and, for $m \geq 2$, set $p=mk$. Assume $\mathbb{X}$ is a canonical geometric stochastic process that satisfies one of the following: with and consider a signature-defining, cf. Definition def:sig_def_part, sequence of refining partitions $\{\pi_n, \ n\geq 1\}$ of the interval $[0, T]$ such that then the stronger convergence holds with rate $\mathcal{O}(\sum_{n

Figures (3)

  • Figure 1: Estimating the expected signature estimation from a finite collection of discretely-observed paths.
  • Figure 2: Distributions of expected signature estimators for BM. The $y$-axis is in log-scale.
  • Figure 3: Distributions of expected signature estimators for the Heston process with parameters $s_0 = 1, v_0 = 0.1, \theta = 0.1, \kappa = 0.6, \xi = 0.2$ and $\rho = -0.15$. The $y$-axis is in log-scale.

Theorems & Definitions (38)

  • Definition 2.1
  • Remark 2.2
  • Definition 2.3
  • Remark 2.4
  • Definition 2.5
  • Remark 2.7
  • Theorem 2.8
  • proof
  • Remark 2.9
  • Theorem 2.10
  • ...and 28 more