Learning with Expected Signatures: Theory and Applications
Lorenzo Lucchese, Mikko S. Pakkanen, Almut E. D. Veraart
TL;DR
The paper addresses the challenge of deriving probabilistic guarantees for model-free embeddings built from the expected signature of data streams. It introduces a unified theory linking discrete, pathwise signatures to their continuous-time counterparts under canonical geometric rough paths, proving consistency and asymptotic normality in both in-fill and long-span regimes. A martingale-correction variant is proposed to reduce finite-sample variance, with guarantees that the bias remains controlled; this yields substantial improvements in synthetic and financial ML tasks. The authors validate the theory across Brownian, fractional Brownian, CAR, and Heston models, and demonstrate practical gains in time-series classification, derivative pricing, and distributional regression, underscoring the approach’s broad applicability and potential for probabilistic interpretation of signature-based ML methods.
Abstract
The expected signature maps a collection of data streams to a lower dimensional representation, with a remarkable property: the resulting feature tensor can fully characterize the data generating distribution. This "model-free" embedding has been successfully leveraged to build multiple domain-agnostic machine learning (ML) algorithms for time series and sequential data. The convergence results proved in this paper bridge the gap between the expected signature's empirical discrete-time estimator and its theoretical continuous-time value, allowing for a more complete probabilistic interpretation of expected signature-based ML methods. Moreover, when the data generating process is a martingale, we suggest a simple modification of the expected signature estimator with significantly lower mean squared error and empirically demonstrate how it can be effectively applied to improve predictive performance.
